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Learning  One  Subprocedure  Per  Lesson 


Kurt  VanLehn 


Abstract 

Sierra  is  a  program  that  learns  procedures  incrementally  from  examples, 
where  an  example  is  a  sequence  of  actions.  Sierra  learns  by  completing 
explanations.  Whenever  the  current  procedure  is  inadequate  for  explaining 
(parsing)  the  current  example.  Sierra  formulates  a  new  subprocedure  whose 
instantiation  completes  the  explanation  (parse  tree).  The  key  to  Sierra  s 
success  lies  in  supplying  a  small  amount  of  extra  information  with  the 
examples,  instead  of  giving  it  a  set  of  examples,  under  which  conditions 
correct  earning  is  provably  impossible,  it  is  given  a  sequence  of  "lessons." 
where  a  lesson  is  a  set  of  examples  that  is  guaranteed  to  introduce  only  one 
subprocedure.  This  permits  unbiased  learning,  i.e..  learning  without  a  priori, 
heuristic  preferences  concerning  the  outcome 
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Learning  One  Subprocedure  Per  Lesson 

Kurt  VanLehn 


1.  Introduction 

Much  research  in  machine  learning  has  concentrated  on  induction,  i.e  ,  learning  from 
examples  Understanding  induction  is  certainly  one  of  the  great  intellectual  challenges 
of  our  times  induction  stands  at  the  center  of  both  the  psychology  of  learning  and 
the  philosophy  of  science 

More  recently,  induction  has  been  heralded  as  a  potential  solution  to  the  knowledge 
acquisition  problem  of  expert  systems.  However.  Al  has  not  had  much  success  at 
applying  induction  to  practical  problems.  The  difficulty  is  this:  in  practical  settings, 
only  a  finite  set  of  examples  can  be  presented  to  the  learner,  but  the  knowledge 
representation  language  usually  has  enough  expressive  power  that  there  are  always 
infinitely  many  representable  generalizations  that  are  consistent  with  the  examples  that 
the  learner  has  seen.  (The  formal  results  that  justify  this  assertion  will  be  reviewed 
later  )  That  is,  the  examples  |ust  do  not  contain  enough  information  to  correctly 
identify  the  target  generalization  Consequently,  the  learner  must  guess  Al  research 
on  induction  has  consisted  largely  of  studying  the  efficacy  and  domain  independence 
of  various  heuristics  for  guessing.  in  this  respect,  the  induction  problem  is  quite 
different  than  other  kinds  of  Al  problems,  e  g  recognition  or  synthesis,  where  correct 
answers  are  computable  in  principle,  but  non-AI  techniques  would  take  astronomically 
long  to  compute  them. 

The  inherent  uncertainty  of  induction  suggests  studying  forms  of  quasi-mductive 
learning  where  a  teacher  gives  some  extra  information  to  the  learner  in  addition  to 
the  examples  Winston  argues  for  "the  importance  of  good  training  sequences 
prppared  by  good  teachers  I  think  it  is  reasonable  to  believe  that  neither  machines 
nor  children  can  be  expected  to  learn  much  without  them  [47.  pg.  6]."  As 
Winston  [48  49  50]  has  pointed  oui.  there  is  a  spectrum  defined  by  how  much  extra 
information  is  provided  On  the  one  end  of  the  spectrum  is  induction,  where  all  the 
'earner  receives  is  examples  On  the  other  end  is  learning  by  being  told,  where  the 
learner  is  given  a  complete  description  of  the  target  generalization  in  some  language 


The  kind  of  learning  studied  here  falls  closer  to  the  induction  end  of  the  spectrum 

end  than  the  learning-by-bemg-told  end,  because  very  little  extra  information  is 
supplied  it  could  be  called  learning  from  lesson  sequences  because  the  extra 
information  given  to  the  learner  is  embedded  m  the  way  that  tne  examples  are 
partitioned  into  lessons  and  the  way  the  lessons  are  sequenced  in  the  last  section  of 
this  article  a  variant  of  learning  from  lesson  sequences  will  be  discussed  wherein 
lessons  are  omitted  ana  the  example  sequence  alone  carries  all  the  extra  information 

One  of  tne  most  important  areas  for  applying  induction  is  m  the  learning  of 

procedures  Proceaure  learning  is  the  central  problem  m  programming  by 
examples  (1  7  8  40  4t|.  in  psychological  modelling  of  skill 

acquisition  [2  3.  33.  44  45].  and  in  automating  protocol  analyses  [6]  Machine 

learning  of  procedures  has  been  suggested  as  one  way  to  solve  the  knowledge 

acquisition  problem  for  expert  systems  [31].  and  as  a  technique  for  modelling  students 
in  intelligent  tutoring  system  [27.  25].  Undoubtably.  there  are  many  other  applications 
for  procedure  learning  systems  that  have  not  yet  been  studied 

An  obvious  testbed  for  the  new  approach  of  learning  from  lesson  sequences  is  a 

system  to  learn  procedures.  This  article  presents  Sierra,  a  system  that  learns 

procedures  from  lesson  sequences. 

1  1  The  particular  learning  task  solved  by  Sierra 

Sierra  was  built  for  a  specific  application,  a  psychological  study  of  the  acquisition  of 
mathematical  skills,  arithmetic  m  particular  [44.  45]  Although  it  would  make  the 
readers  |0b  easier  if  Sierras  techniques  were  displayed  in  a  toy  problem  domain,  this 
article  presents  an  unsimplified.  accurate  picture  First,  the  kinds  of  examples  qiven  to 
Sierra  will  be  described  followed  by  a  description  of  lesson  sequences. 

Two  general  kinds  of  procedure  induction  problems  have  been  addressed  in  the 
literature  The  harder  one  is  learning  a  procedure  from  input-output  pairs  [i]  The  one 
studied  here  is  learning  from  action  sequences'  it  is  assumed  that  the  agent  that 
executes  procedures  is  like  a  human  or  a  robot  m  that  its  procedures  manipulate  both 
■1)  an  external  world  that  all  agents  have  access  to  ard  2'  an  ntemai  state  which 
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is  private.  The  internal  state  might  include  a  stack,  for  ance  An  action  sequence 
consists  of  a  sequence  of  state  changes  to  the  external  world  (or  equivalently  a 
sequence  of  world  states)  The  learner  cannot  see  the  internal  state  of  the  teacher 

during  an  action  sequence  Action  sequences  are  "examples"  of  the  target 

procedures  execution  The  induction  task  is  to  infer  a  procedure  from  such  examples 
There  are  two  kinds  of  examples,  positive  and  negative.  A  positive  example  is  an 

action  sequence  that  an  induced  procedures  should  generate  when  it  is  run  on  the 

problem  that  appears  as  the  initial  state  in  the  action  sequence  A  negative  example 
is  an  action  sequence  that  an  induced  procedure  should  not  generate 

Sierra  s  input  is  an  ordered  sequence  of  lessons  where  a  lesson  is  an  unordered 
set  of  examples  Lessons  contain  only  postlve  examples  Each  lesson  is  marked  with 
a  single  bit  The  bit  Is  1  if  the  lesson  is  a  "normal"  lesson,  and  0  if  it  Is  an 
optimization"  lesson2  In  a  moment,  the  semantics  of  lessons  and  their  marks  will  be 
described  The  point  to  note  here  is  that  the  extra  information  given  to  Sierra  over 
and  beyond  the  set  of  examples  consists  only  of  (1)  the  partition  of  the  examples  into 
lessons  (2)  the  ordering  of  the  lessons,  and  (3)  the  binary  mark  on  lessons.  Although 
this  is  very  little  extra  information  it  vastly  simplifies  Sierra's  induction  task.  The  next 
section  discusses  why 

Sierra  learns  procedures  incrementally  Each  lesson  builds  on  the  procedure  learned 
m  a  previous  lesson  This  can  be  best  illustrated  with  a  familiar  procedure,  such  as 
the  procedure  for  ordinary  multicolumn  subtraction.  Figure  1-1  shows  a  lesson 
sequence  for  subtraction  There  are  six  lessons  Although  a  typical  Sierra  lesson  has 
about  five  examples  the  figure  shows  only  the  first  example  from  each  lesson  Each 
example  is  shown  as  a  sequence  of  states  Figure  1-2  shows  a  corresponding 
sequence  of  procedures,  where  the  procedures  are  sketched  as  Augmented  Transition 
Nets  iATNs).  The  procedures  correspond  to  the  lessons  as  follows  PO  is  induced 
from  lesson  LO:  Pi  from  PO  and  LI:  P2  from  Pi  and  L2  etc  The  last  procedure 
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P5  is  a  complete  correct  subtraction  procedure  3 


1.2  Why  lesson  sequences  simplify  procedure  induction 
An  earlier  discussion  put  learning  from  lesson  sequences  on  the  same  scale  as 
'earning  by  being  told  This  may  seem  strange,  information  encoded  in  the  lesson 
sequence  does  not  seem  like  a  linguistic  expression.  However,  it  functions  m  exactly 
the  same  way  Linguistic  expressions  have  meaning  for  the  learner  only  under 
interpretation  The  conventions  governing  the  interpretation  are  known  by  both  the 
teacher  and  the  learner.  The  teacher  generates  explanations  in  such  a  way  that  the 
learner  will,  ideally,  interpret  them  the  way  the  teacher  wants  them  to  be  interpreted. 
The  same  kind  of  convention-driven  interpretation  underlies  the  use  of  lesson 

sequences.  The  formatting  information  (i.e. .  the  partition,  the  order,  the  marks)  is 

generated  by  the  teacher  who  understands  the  interpretation  that  that  learner  will  place 

on  that  information.  Two  interpretive  conventions  are  explored  here: 

1.  A  normal  lesson  introduces  at  most  one  subprocedure.  Roughly  put,  a 

subprocedure  is  like  one  cond  clause  in  Lisp,  a  test,  which  if  true  causes 
an  implicit  progn  of  function  calls  to  be  executed,  A  precise  definition  of 
"subprocedure"  will  be  given  after  the  knowledge  representation  language 
for  procedures  is  described. 

2.  A  normal  lesson  introduces  material  that  will  allow  the  learner  to  solve 

problems  that  it  could  not  solve  before  An  optimization  lesson  shows  the 
learner  more  efficient  ways  to  solve  the  same  class  of  problems  that  it 

could  solve  before  the  lesson.  Furthermore,  a  normal  lesson  may  not 

introduce  optimized  methods,  wherein  some  of  the  procedure  s  calculations 
are  performed  internally  and  do  not  appear  in  the  action  sequence 

These  two  conventions  are  not  arbitrary  They  directly  address  two  of  the  worst 

combinatorial  problems  in  induction.  The  first  convention  allows  the  learner  to  solve 
the  disjunction  problem,  which  involves  deciding  when  and  where  to  place  disjunctions 
(The  disjunction  problem  will  be  discussed  at  length  in  section  3.)  For  procedures  a 
disiunction  is  a  branching  (e  g  a  cono)  m  the  flow  of  control  Convention  t,  above 
informs  the  learner  that  there  will  be  at  most  one  new  disjunct  per  lesson.  This  cuts 
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down  the  possible  places  for  disjunctions  to  a  finite,  small  set  and  thereDy  significantly 
reducing  the  learner  s  search  space  of  possible  procedures.  This  makes  the  learner  s 
|  iob  much  easier  in  fact,  it  makes  it  possible  as  opposed  to  impossible  The  first 

j  convention  is  named  one-dis/unct-per-lesson. 

i 

•  The  second  convention  fulfills  a  similar  function  with  regards  to  a  second 

combinatorial  problem  the  invisible  obiects  problem  if  two  visible  objects  m  a  state 
can  be  related  by  arbitrarily  long  chains  of  calculations  with  arbitarily  many 

intermediate  results,  then  induction  is  combmatonally  infeasible  The  intermediate  results 
are  invisible  obiects  because  they  don  t  appear  m  the  examples  if  they  could  be 

’  seen,  the  combinatorics  would  be  substantially  reduced  The  second  convention 

provides  this  by  mandating  that  normal  lessons  explicate  such  chains  by  showing  all 
the  intermediate  results  Optimization  lessons  may  come  along  later  and  show  how  to 

suppress  the  intermediate  results  and  perform  the  calculations  "in  one  s  head  "  The 

convention  is  called  show-worn. 

To  evaluate  the  efficacy  of  this  approach,  or  any  approach  to  learning  from  material 
prepared  by  a  teacher  one  must  evaluate  burdens  placed  on  both  the  teacher  and 
the  learner  One  would  expect  there  to  be  some  work  required  of  each  because 
learning-from-lesson-sequences  lies  halfway  between  induction,  where  the  learner  does 
most  of  the  work  and  learnmg-by-bemg-told.  where  the  teacher  does  most  of  the  work. 

The  teacher  s  job  'S  to  generate  a  lesson  sequence  for  a  given  target  procedure 
that  satisfies  one-disiunct-per-lesson  and  show-work  This  task  can  be  accomplished 
mechanically  if  the  teacher  writes  down  the  target  procedure  in  an  appropriate 
procedural  language  e  g.  Lisp  But  writing  procedures  can  be  quite  a  bit  of  work.  A 
more  interesting  possibility  is  that  experienced  teachers  generate  such  lesson 

sequences  naturally,  without  even  realizing  that  their  lesson  sequences  obey  the  two 
conventions  This  is  exactly  what  my  research  on  naturally  occurmg  lesson  sequences 
m  mathematics  shows  (44  45(  Educators  tend  to  generate  well-formed  lesson 
sequences  even  though  they  probably  are  not  aware  of  the  conventions  Apparently, 
thev  nave  an  intuitive  unconscious  appreciation  of  the  conventions  This  allows  them 
to  generate  appropriate  lesson  sequences  without  going  through  the  work  of  explicating 
and  formalizing  the  procedures  taught  ov  those  curricula 
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This  raises  an  interesting  possibility  tor  applications  where  a  computer  system  learns 
from  a  human  user,  sucn  as  programming-by-example  systems  (e  g  .  [8))  or  learning 
apprentice  systems  (eg..  (301)  Such  systems  usually  assume  that  there  is  no 
meaningful  structure  in  the  example  sequence  that  is  presented  to  the  system 
However,  if  that  the  users  view  themselves  as  teaching  the  system,  they  may  order 
their  examples  in  certain  ways.  At  the  very  least,  they  will  present  easy  cases  before 
hard  cases.  If  we  had  a  precise  definition  of  the  ordering  criterion  that  users  tend  to 
employ,  and  if  the  learning  system  were  designed  to  take  advantage  of  these  tacit 
constraints  on  the  instructional  material,  then  it  could  recover  information  that  is  latent 
in  the  sequential  ordering.  This  latent  information  might  allow  it  to  converge  faster 
and  more  reliably  on  the  knowledge  that  the  user  is  trying  to  teach  it.  One-disjunct- 
per-iesson  and  show-work  are  exactly  such  constraints,  and  they  do  make  Sierra  a 

more  effective  learner.  Further  research  is  needed  to  see  how  domain-general  they 

are  and  to  see  whether  there  are  other  constraints  like  them. 

it  might  seem  strange  that  teachers  should  obey  conventions  like  one-disjunci-per- 
lesson  and  show-work  without  being  aware  of  them.  Looked  at  a  different  way.  it 

would  seem  strange  if  they  didn't.  The  teacher-learner  situation  is  an  extended 

communication  act  We  know  that  people  naturally  unconsciously  obey  many 

conventions  on  natural  language  communication  acts  (see.  e  g..  [37]).  it  seems  entirely 
likely  that  the  teacher-learner  discourse  should  also  be  formatted  by  conventions.  In 

the  hope  that  this  analogy  is  approximately  correct,  the  conventions  that  govern 

learning  will  be  named  felicity  conditions,  an  early  name  for  certain  natural, 
unconsciously-followed  language  conventions  [4], 

The  point  of  felicity  conditions  is  to  make  the  learner  s  |Ob  easier  without  burdening 
the  teacher  too  much.  One-disiunct-per-lesson  and  show-work  make  Sierra  s  |ob  rather 
simple,  although  not  trivial.  The  following  is  a  quick  sketch  of  how  it  works  The 

details  will  be  presented  later 

1  3  How  Sierra  works 

Sierra  s  algorithm  is  called  learning  py  comoiermg  explanations  It  begins  ov  trying  to 
oarse  an  action  sequence  using  the  procedure  as  if  it  were  a  grammar  but  a 
grammar  with  data  flow  conditional  tests  etc  if  oarsing  succeeds  the  resulting 
parse  'ree  <s  a  trace  n  e  subroutine  calling  hierarchy)  _ooked  at  m  a  afferent  wav 
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the  parse  tree  constitutes  an  "explanation"  for  the  action  sequence  For  instance,  a 
partial  explanation  for  an  individual  action  m  a  sequence  can  be  read  off  the  parse 
tree  by  walking  upwards  from  the  action,  i.e  .  the  action  was  done  in  order  to  satisfy 
the  goals  of  the  subprocedure  that  called  it  (which  is  the  next  node  above  it  in  the 
parse  tree!,  and  the  caller  was  executed  in  order  to  satisfy  its  caller,  and  so  on.  A 
complete  explanation  for  an  action  involves  taking  into  account  data  flows  and  side- 
effects.  so  explicit  links  for  these  effects  are  included  m  Sierra  s  parse  trees  Such 
links  are  analogous  to  the  causal  links  that  thread  through  the  hierarchical  structures 
of  explanations  of.  eg  kidnapping  stories  [10]  So  explaining  an  action  sequence  is 
just  parsing  it  Such  parsing  is  a  form  of  plan  recognition  (eg  (16.  20.  36]). 


The  conditional  tests  and  data  flows  of  a  procedure  are  used  to  guide  Sierra  s 
parser,  significantly  narrowing  its  search  for  a  parse  tree  However,  the  parser  may 
choose  to  relax  such  tests  or  ignore  them  entirely  This  may  allow  it  to  find  a  parse 
tree  when  it  could  not  do  so  otherwise  If  so.  then  a  simple  form  of  learning  can  be 
performed  The  relaxations  made  by  the  parser  are  editted  into  the  procedure.  For 
instance,  if  certain  predicates  in  a  conditional  test  must  be  ignored  by  the  parser,  then 
Sierra  removes  them  from  the  conditional  This  generalizes  the  condition  test  Now  if 
the  parser  redoes  the  narrow  search,  the  one  that  obeys  the  constraints  imposed  by 
conditional  tests,  etc  .  it  will  find  the  parse  tree  Sierra  has  generalized  the 
procedure  allowing  it  to  explain  examples  that  it  could  not  explain  before  This 
learning  technique  is  similar  to  one  form  of  expianavon-pased  learning 

(10.  11  14  32  39|  A  more  interesting  kind  of  learning  occurs  when  it  is 

impossible  to  complete  a  parse  no  matter  how  much  the  procedure  is  generalized  in 
this  case,  the  learner's  procedure  is  fundamentally  incomplete  One  or  more  new 
subprocedures  must  be  invented  Learning  by  completing  explanations  is  one  approach 
to  accomplishing  the  learning  required  m  this  situation 

Sierra  uses  a  straightforward  technique  A  similar  approach  was  employed  in  three 
independent  investigations  (16  18  43]  The  first  step  is  to  parse  the  action  sequence 
bottom-up  as  far  as  possible  and  top-down  as  far  as  possible  The  candidate 
solutions  ro  the  learning  task  consist  of  any  new  subprocedure  ior  set  of  new 
subprocedures)  that  links  the  top-down  parse  to  the  bottom-up  parse  m  such  a  way 
that  a  complete  parse  tree  is  yielded  Even  for  snort  action  sequences  there  can  be 
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millions  of  candidates  The  challenge  is  to  cope  with  this  large  space  of  candidates 
The  solution  used  by  the  three  independent  investigators  [ibid  )  is  to  place  sucn 
strong  constraints  on  the  parsing  that  only  one  (or  a  few)  of  the  possible  candidates 
are  generated  My  solution  is  to  ii)  use  unconstrained  parsing  (2)  assume  that  only 
one  new  subprocedure  will  be  acquired  and  (3)  use  a  factored  data  structure  (similar 
to  LUNAR's  well-formed  substring  table  [51]  and  GSP  s  chart  [21])  to  efficiently 
represent  the  space  of  possible  candiates.  The  selection  of  a  candidate  from  this 
space  is  accomplished  by  a  collection  of  simple  filters. 

This  technique  for  learning  by  completing  explanations  makes  it  simple  to  perform 
induction  across  several  action  sequences  Each  action  sequence  yields  a  space  of 
candidate  solutions  represented  as  a  GSP-style  chart  induction  amounts  to 
intersection  of  these  spaces 

in  principle,  this  technique  can  be  used  m  any  domain  which  learns  hierarchical 
knowledge  structures  from  sequential  examples.  Thus,  it  should  extend  to  learning 
grammars  from  strings,  learning  story  understanding  schemata  from  stones,  and 
learning  device  models  from  the  operation  of  machines. 

Many  of  the  basic  intuitions  behind  Sierra  have  been  presented  The  remainder  of 
this  article  presents  the  details  of  how  Sierra  accomplishes  its  task  The  first  section 
presents  the  knowledge  representation  language  used  for  procedures  The  next 
section  reviews  the  induction  problem,  and  isolates  disiunction  as  a  key  difficulty 
One-diS|unct-per-iesson  is  proposed  as  a  central  simplifying  constraint  for  inducing 
procedures,  it  in  turn  leads  to  the  definition  of  "suborocedure."  in  section  4.  in  terms 
of  three  parts:  its  skeleton,  its  patterns  and  its  functions  This  reduces  the  problem  of 
inducing  a  subprocedure  to  three  subproblems.  one  for  each  part  The  subsequent 
sections  discuss  the  Sierra  algorithms,  for  respectively,  skeleton  induction,  pattern 
induction  and  ‘unction  induction  The  final  sections  discuss  the  generality  of  Sierra 
and  speculate  on  the  origins  and  applications  of  felicity  conditions 

2  The  representation  of  procedures 

it  s  convenient  to  use  a  mixture  of  nomenclature  from  production  systems  and  And- 
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Or  graphs  (aogs)  4  The  latter  equivalence  class  of  representations  includes  the  one 
used  by  Sierra.  Although  any  of  the  formalisms  could  be  used  to  describe  Sierra  s 
representation  the  mixture  of  production  systems  and  AOG  is  used  here )  The 
production  systems  nomenclation  is  good  for  showing  details,  the  aog  view  is  good  for 
showing  the  overall  structure 

Figure  2-la  sketches  the  aog  view  of  a  subtraction  procedure  learned  by  Sierra. 
The  nodes  are  called  goals,  and  links  are  called  rules.  Rules  are  directed  and  are 
always  drawn  running  downward  The  nodes  just  beneath  a  goal  are  called  its 
subgoals.  Currently,  there  are  two  types  of  goals:  and  and  or3.  To  execute  an  and 
goal,  all  the  subgoals  are  executed.  To  execute  an  on  goal,  just  one  of  the  subgoals 
is  executed,  ano  goals  are  drawn  with  boxes  around  their  labels.  Drawings  of  aogs 
abbreviate  goals  whenever  they  appear  more  than  once  For  instance,  ovrwrt  is 
called  from  several  places  in  the  aog  of  figure  2-1  a.  but  its  subgoals  are  drawn  only 
for  one  of  these  occurrences.  Although  abbreviation  makes  this  aog  look  like  a  tree, 
it  is  really  a  cyclic  directed  graph  due  to  the  recursive  calls  of  multi  and  regroup 

aog  drawings  do  not  indicate  several  kinds  of  information.  This  information  is  readily 
visible  in  the  production  system  view.  Figure  2-1b  shows  the  definitions  for  the  non¬ 
primitive  goals  m  the  aog  of  figure  2-1a.  Goals  have  arguments.  For  instance. 
Sub ' col  has  three  arguments.  T.  B  and  A  Arguments  have  the  substitution  semantics 
of  lambda  calculus.  That  is.  the  aog  language  is  applicative.  There  are  no 
assignment  statements  The  only  side-effect  operators  are  those  that  change  the 
external  state,  i.e..  writing  a  digit  in  an  answer  The  applicative  property  has  important 
consequences  that  will  be  discussed  later 

A  goal's  rules  (i.e..  the  rules  leading  from  it  to  its  subgoals)  are  listed  in  its 
definition.  subicol  has  three  rules.  Each  rule  has  a  pattern  and  an  action 
Patterns  are  large,  so  in  figure  2-1  b,  most  patterns  have  been  replaced  by  English 


4 

:i  is  well  known  that  context-free  grammars,  oush-down  automata  ana  oasic  vansmon  nets  are 
equivalent  in  ihe  same  wav.  attnhute  grammars  |22).  afhx  grammars  2 3 1  ana  ATNs  are  eauivalem. 
croviaea  that  side-effect  operators  e  g.  5ETQ)  are  not  usea  m  me  ATNs 


'n  nraer  *0  accomoaate  certain  empirical  data,  a  thud  goal  ’vce  -OREACH  -vii  oe  aaceo  as  a 
•eoresentanon  to r  t eravon  across  a  sequence  of  ooiec's  This  goa1  tvoe  appears  -n  figure  '  2  P"  as  a 
ooo  n  me  ATNs  graph  Currently  Sierra  represents  aeration  min  -an  ecursions 
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glosses.  A  pattern  is  a  conjunction  of  literals  (i.e  .  predicates  or  negated  predicates). 
Predicate  arguments  may  be  either  arguments  from  the  enclosing  goal  or  pattern 
variables.  As  an  example,  the  pattern 

(Column  C)4(Top  C  T)4(Diglt  T)4(Bottom  C  B)4(Digit  B)4(LeasThan  T  B) 
matches  columns  that  require  borrowing.  The  empty  pattern  always  matches 

A  rule  s  action  is  a  form,  in  the  Lisp  sense,  which  calls  the  rule  s  subgoal.  The 
action  may  pass  arguments  to  the  subgoal,  often  by  evaluating  functions.  For 
instance,  sub i col s  third  rule  has  (Write  A  (Sub  (Read  T)  (Read  B)))  as  its  action. 
This  action  writes  the  difference  of  the  top  and  bottom  digits  of  a  column  in  the 
column  s  answer 

An  on  goal  s  rules  are  tested  in  left-to-right  order.  The  first  rule  whose  pattern 

matches  is  executed.  The  learner  adds  new  rules  at  the  left  Hence,  the  left-to-right 
ordering  convention  corresponds  to  a  common  conflict  resolution  strategy  in  production 
systems  called  "recency  in  long  term  memory"  [26).  Because  the  patterns  of  OR  rules 
test  whether  to  execute  a  rule,  they  are  called  test  patterns.  Although  and  rule 

patterns  have  the  same  syntax  as  or  rule  patterns,  they  are  not  used  to  control  which 

rules  are  executed  The  order  of  execution  of  and  rules  is  fixed:  the  rules  are 

executed  in  left-to-right  order  and  rule  patterns  are  used  to  retrieve  information  in  the 
current  problem  state  so  that  the  information  can  be  passed  to  the  rule  s  subgoal 
and  rule  patterns  are  called  fetch  patterns. 

Any  learning  model  that  describes  how  knowledge  is  constructed  from  smaller  units  is 
open  to  questioning  about  its  set  of  primitives:  what  are  the  units  that  are  assumed  to 
be  present  when  learning  begins?  For  completeness,  table  2-i  lists  the  kinds  of 
primitives  used  by  Sierra,  and  the  particular  ones  employed  to  learn  the  procedures 
discussed  in  this  article,  in  addition  to  these  primitives,  the  initial  knowledge  state 
may  contain  non-primitive  procedures  as  well  For  instance  the  initial  procedure  from 
which  the  procedure  of  figure  2-1  was  learned  contained  the  non-pnmit.ve  goal 
ovrwrt.  which  crosses  out  a  symbol  and  writes  another  symbol  over  it  The 
multiplication  procedure  s  initial  knowledge  state  included  an  addition  procedure 

The  procedural  representation  language  has  been  presented  The  remainder  of  this 
section  is  a  "walk  through"  of  the  procedure  of  figure  2-i  which  some  reader  may 


1.  Primitive  actions  cause  a  change  in  the  current  problem  state  The  only 
primitive  actions  used  in  mathematics  are  ones  that  write  a  given 
alphanumeric  symbol  at  a  given  position  (Write),  or  ones  that  write  special 
kinds  of  symbols  iCrossOut  puts  a  slash  over  a  symbol:  Bar  writes  a  bar 
under  a  group  of  symbols). 

2  Fact  functions  return  a  number  without  changing  the  problem  state  The 
following  fact  functions  were  employed:  Add,  Sub.  Addl.  Subl.  Mult 
Quotient.  Remainder,  One  (which  always  returns  1),  Zero  (which  returns  0). 
and  Concat  (which  concatenates  two  numbers,  eg..  (Concat  1  4)  returns 
14) 

3.  Fact  predicates  return  true  or  false  without  changing  the  problem  state. 
The  fact  predicates  used  were:  LessThan?.  Equal?,  and  Divisible?. 

4  The  primitive  function  Read,  returns  the  symbol  written  at  a  given  place 


Table  2*1:  The  four  kinds  of  primitives  used  by  Sierra 


find  helpful  as  a  way  of  cementing  their  understanding  of  the  representation. 

The  root  goal  start,  and  its  subgoal.  sub.  initialize  column  traversal  to  start  with 
the  units  column  -/Sue  chooses  between  three  subgoals,  multi  is  for  multiple  column 
problems  regroup  is  for  "regrouping"  exercises  that  don  t  involve  any  subtraction  at 
all  Regrouping  is  the  part  of  borrowing  where  one  digit  is  reduced  by  one  and  an 
adjacent  digit  is  increased  by  ten  This  subgoal  is  left  over  from  learning  regrouping 
separately  from  multi-column  subtraction.  (The  lesson  sequence  of  figure  1-1  lacks  a 
regrouping  lesson,  but  most  textbooks  include  one  This  procedure  was  learned  from 
Heath's  subtraction  lesson  sequence  [ 1 3] .  which  has  a  separate  regrouping  lesson  ) 
Normally.  i/Sub  never  calls  regroup  The  third  goal.  Write,  is  for  single  column 
subtraction  problems  The  "mam  loop"  of  multi-column  traversal  is  expressed  by  mult 
as  a  tail  recursion  multi  calls  itself  via  its  subgoal  sub/rest  sub'COl  processes  a 
column.  it  chooses  between  three  methods  for  doing  so  if  the  bottom  of  the 
column  is  blank,  it  copies  the  top  of  the  column  into  the  answer  via  the  subgoal 
5-tCw2  if  the  top  digit  of  the  column  is  less  than  the  bottom  it  calls  borrow 
Otherwise  it  writes  the  difference  of  the  two  digits  in  the  answer,  sorrow  has  two 
subgoals  i  borrow  calls  regroup  and  s  borrow  just  takes  the  difference  in  the 
column  and  writes  it  in  the  answer  regroup  'S  a  conjunction  of  borrowing  into  the 
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column  that  originates  the  borrow  (borrow/into)  and  borrowing  from  the  adjacent 
column  (borrow/from),  in  this  procedure,  borrow.fpom  occurs  before  borrow/into  it 


would  be  equally  correct  to  reverse  their  order,  but  that  is  not  the  way  that  Heath 
teaches  them.  Borrowing  into  a  digit  is  just  adding  ten  to  it  Borrowing  from  the 
next  column  is  also  easy  when  its  top  digit  is  non-zero:  the  digit  is  decremented,  if 
the  digit  is  zero,  it  calls  bfz.  bfz  regroups,  which  causes  the  zero  to  be  changed  to 
ten.  then  it  decrements  the  ten  to  nine 

3.  Disjunction:  an  inherent  problem  for  induction 

For  many  kinds  of  induction  tasks,  there  are  proofs  that  the  task  has  no  algorithmic 
solution  In  such  proofs,  the  induction  problem  is  defined  by  specifying  a  class  U  of 
all  possible  generalizations  that  the  learner  can  output  and  a  class  T  of  all  possible 
trainings  that  the  learner  can  receive.  (The  "U"  stands  for  the  learner  s  universe  of 
generalizations.)  There  are  a  variety  of  theorems  for  various  U  and  T  For  instance, 
one  such  theorem  is:  if  U  is  the  class  of  recursive  functions  and  T  is  the  set  of  all 
possible  training  sequences  that  contain  all  possible  positive  and  negative  examples, 
then  there  is  no  Turing  machine  that  can  learn  any  given  generalizations  from  U  [35. 
Proposition  5)  Such  negative  results  guarantee  that  there  is  no  straightforward 

solution  to  induction 

The  standard  attack  is  to  incorporate  biases  into  the  inducer  [28].  There  are  two 

kinds.  An  absolute  bias  is  a  unary  predictate  on  generalizations  that  says  whether  or 
not  the  generalization  should  ever  be  output  by  the  inducer  A  relative  bias  is  a 
binary  predicate  on  generalizations  that  says  which  of  the  two  generalizations  is 
preferred  for  output  in  case  both  generalizations  are  consistent  with  the  given  training 
Often,  absolute  biases  are  implemented  by  representing  generalizations  in  a  limited 

representation  language  if  the  generalization  can  not  be  expressed  in  the  language 
(say.  because  the  language  lacks  the  appropriate  primitives),  then  it  will  never  be 
output  by  the  inducer.  A  relative  bias,  on  the  other  hand,  is  usually  defined  by 
comparing  two  formal  expressions  that  represent  the  generalizations.  Simplicity  metrics 
are  a  common  relative  bias.  Al  inducers  often  implement  relative  biases  implicitly  by 

the  order  in  which  they  search.  Because  they  stop  when  they  get  to  the  first 
generalization  that  is  consistent  with  the  training,  the  search  strategies  act  as  relative 
biases  in  short,  biases  correspond  to  two  obvious  kinds  of  contraints  unary  and 
binary  predicates  on  generalizations 


A  non-standard  approach  is  to  employ  a  third  kind  of  constraint,  which  could  be 
called  a  manner  constraint.  A  manner  constraint  relates  a  generalization  to  the 
manner  in  which  the  training  is  presented.  A  manner  constraint  is  a  binary  predicate: 
one  argument  is  a  generalization  and  the  other  is  the  form  (syntax)  of  the  training. 

Both  one-disjunct-per-lesson  and  show-work  are  manner  constraints6. 

Manner  constraints  are  a  known  loophole  to  most  formal  learnability  results.  For 

instance,  a  major  result  [ 1 7]  is  that  it  is  impossible  to  learn  when  (1)  T  employs  only 
positive  examples  and  (2)  U  contains  a  generalization  for  every  finite  set  of  examples 
and  a  generalization  for  at  least  one  infinite  set  of  examples,  unless  (3)  the  examples 

sequences  in  T  are  ordered  by  some  primitive  recursive  function.  That  is.  conveying 

information  with  the  order  of  the  example  sequences  allows  a  learner  to  succeed 

where  it  could  not  otherwise.  The  manner  of  example  presentation  is  a  factor  that 
hasn't  been  studied  much,  but  is  potentially  quite  important. 

There  are  dozens  of  theorems  on  the  learnability  of  certain  U  given  certain  T.  In 

order  to  obtain  such  results,  it  is  necessary  to  be  quite  specific  about  what  the  U  and 

T  are.  Rather  than  review  all  these  specific  results,  it  seems  more  profitable  for  this 

article  to  sacrifice  formality  in  order  to  uncover  the  key  components  of  Us  and  Ts  that 
cause  induction  to  be  impossible.  In  spirit,  this  strategy  is  like  Newell's  Knowledge 
Level  strategy  of  stepping  back  from  the  details  of  countless  Al  problems  and 
representations  in  order  to  analyze  them  from  a  single  perspective,  which  he  suggests 
should  be  the  perspective  of  first  order  logic  [34], 

In  that  spirit.  I  suggest  that  one  of  the  inherent  problems  of  induction  is  disjunction. 
Whenever  U  allows  a  generalization  to  be  built  from  the  disjunction  of  any  two  other 
members  of  U.  then  induction  is  infeasible. 

More  specifically,  suppose  that  g  and  g'  are  two  generalizations  from  U  Even 

without  knowing  how  they  are  represented,  we  can  define  their  disjunction.  Each 

generalization  has  an  extension,  i  e.  the  set  of  all  possible  examples  (instances) 

c 

one  terminology  could  use  a  little  clarification  ne'e  Peiicitv  conditions  ate  defined  as  nteroretive 
: onstramts  on  skill  acQuismon  'hat  oecc'e  ooev  •vimout  Oemg  am a'e  ot  'hem  ^'esumaDiv.  one  comd 
explicitly  teii  students  a  manner  constraint.  .n  */nicn  case  ’t  •  ndn  t  auaiitv  as  a  te'iccv  conditio’ 

•^esumaoiv  ihere  ccuid  oe  teucitv  conditions  'hat  aren  t  manner  constraints,  out.  sav.  eiative  dases 
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consistent  with  the  generalization  Let  x  and  x  be  the  extensions  of  g  and  g 
respectively  The  disiunction  of  g  and  g  is  any  generalization  whose  extension  is  the 
union  of  x  and  x  This  is  the  definition  of  "disiunction"  that  will  be  used  m  this 
article. 

Disjunctions  often  correspond  to  syntactical  constructions  in  representational  language 
in  aog  representations,  disiunctions  correspond  to  or  goals,  in  context-free  grammars, 
a  disiunction  is  present  when  two  or  more  rules  reduce  the  same  non-terminal 
category  in  production  systems,  a  production  is  potentially  disjunctively  related  to  all 
the  other  productions:  it  requires  careful  analysis  to  uncover  the  actual  disiunctive 
relationships 

induction  s  trouble  occurs  when  the  class  of  all  possible  generalizations  admits  free 
disjunction.  That  is.  the  disiunction  of  g  with  g  is  in  the  class  whenever  g  and  g 
are.  When  this  is  the  case,  induction  acquires  some  strange  properties  that  make  is 
seem  quite  unlike  anything  that  one  would  want  to  call  "learning  ' 

Free  use  of  disjunction  allows  the  learner  to  generate  absurdly  specific 
generalizations.  One  such  absurdity  is  the  trivially  specific  generalization:  a  disjunction 
whose  disjuncts  are  exactly  the  positive  examples  that  the  learner  has  received  Thus, 
if  the  learner  received  positive  examples  a.  P  and  c.  then  the  disjunction  ton  a  o  c)  is 
the  trivially  specific  generalization  The  trivially  specific  generalization  is  not  really  a 
generalization  at  all  its  extension  is  just  ;  a  b  c!  The  learner  didn  t  really  learn  it 
lust  remembered  This  problem  could  be  solved  with  an  arbitrary  prohibition  against 
trivially  specific  generalizations,  if  nothing  else  But  there  is  another  problem  that  is 
much  worse 

When  disjunctions  are  unconstrained  the  learner  has  to  be  given  the  complete 
extension  of  the  generalization  being  taught  before  it  can  reliably  discriminate  that 
generalization  from  the  others  To  see  this  first  assume  that  for  each  example  there 
'S  a  generalization  m  u  wnose  extension  is  that  example  and  only  that  example  This 
is  equivalent  to  assuming  that  the  examples  can  oe  represented  in  the  representation 
language  used  for  generalizations  For  instance,  a  grammar  consisting  only  of  the  rule 
S->w  is  such  a  generalization  where  S  >s  the  root  category  and  w  is  a  string  of 
terminals  This  grammars  extension  is  the  singleton  set  ; w:  Using  sucn  singleton 
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generalizations  and  disjunction  any  finite  set  of  examples  can  be  described  by  some 
generalization.  To  get  the  generalization  for  ;wi.  w2j.  one  finds  the  generalization  for 
;wi;  and  for  :w2,'.  then  forms  their  disjunction.  Since  ail  finite  sets  of  examples 
correspond  to  generalizations,  the  learner  can't  tell  which  generalization  is  correct  until 
it  is  told  exactly  what  the  target  generalization  s  extension  is.  This  means  it  must  be 
shown  all  possible  examples  and  be  told  which  are  positive  examples  and  which  are 
negative  examples.  Such  conditions,  where  a  "learner"  is  shown  a  complete  extension 
of  a  generalization  and  asked  to  identify  the  generalization,  hardly  qualify  as  learning’ 

Suppose  learning  is  viewed  as  the  following  rather  naive  search  for  generalizations 
This  will  provide  another  perspective  on  the  trouble  that  disjunction  causes.  Suppose 
that  the  learner  receives  an  initial  example  and  generates  a  single  generalization 
Suppose  further  that  the  next  example  is  a  positive  one  that  the  learner  s  current 
generalization  is  not  consistent  with.  The  learner  has  two  choices:  (i)  to  modify  the 
current  generalization  enough  so  that  it  becomes  consistent  with  the  new  example,  or 
(2)  to  create  a  generalization  specifically  for  the  new  example,  then  disjoin  that 
generalization  with  the  current  one  Roughly  speaking,  these  two  choices  are  available 
at  every  step  so  after  N  examples,  there  will  be  roughly  2N  possible  generalizations 
consistent  with  the  examples.  The  point  is  simply  that  the  set  of  consistent 

generalizations  grows  as  the  learner  is  given  more  positive  examples  it  doesn  t 
shrink  as  one  would  intuitively  expect  of  learning  from  examples  On  the  other  hand 
if  disjunction  is  barred  from  u.  then  there  is  only  one  choice  at  each  step,  and  the 
set  of  generalizations  does  not  grow  unboundedly.  So  this  learner  s  failure  to  learn 
can  be  blamed  squarely  on  disjunction 

One-diS|unct-per-lesson  would  constrain  this  naive  learners  search  while  allowing 
generalizations  to  contain  disjunctions  Most  of  the  time,  the  learner  would  have  a 
single  choice  However,  on  the  first  example  of  each  lesson  it  would  have  two 
choices  it  can  either  disjoin  or  not  If  it  chooses  to  disjoin  then  it  may  not  disiom 
again  until  the  next  lesson,  if  it  chooses  not  to  disjoin  then  the  twofold  choice  is 
again  available  on  the  next  example  One-disjunct-periesson  is  indeed  a 
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straightforward  solution  to  the  disjunction  problem 

One-disiust-per-iesson  solves  the  disjunction  problem  by  modifying  the  relationship 
between  T  and  U  Absolute  ana  relatives  biases  solve  the  dusjuction  problem  by 
modifying  U  A  relative  bias  partially  orders  the  elements  of  U  by  preferring  e  g  . 
generalizations  with  fewer  disjunctions.  An  absolute  bias  removes  from  U  ail 

generalizations  that  contain  disjunctions  tor  contain  more  than  13  disjunctions,  etc  I  in 
general,  biases  modify  U  and  manner  constraints  modify  the  relationship  between  T 
and  U 

Biases  are  appropriate  for  a  learning  task  when  the  learner  can  make  strong  a  priori 
assumptions  about  U  Manner  constraints  are  appropriate  when  the  learner  can  t  make 
assumptions  about  what  s  going  to  be  learned,  but  it  can  make  assumptions  about 
how  its  going  to  be  taught  For  Sierra  s  task  domain,  it  is  inappropriate  to  use  biases 
to  solve  disjunct  problem  There  is  no  reason  for  the  learner  to  believe  that  a 
procedure  should  have  no  condtionals  (or  less  than  13  conditionals),  so  an  absolute 
bias  against  disjunction  is  inappropriate  There  is  no  reason  for  the  learner  to  believe 

that  a  procedure  with  the  fewer  conditionals  is  better  so  a  relative  bias  is  also 

inappropriate  However  given  the  felicity  conditions  hypothesis  there  is  reason  to 
believe  that  T  and  U  are  related,  so  a  manner  constraint  such  as  one-disjunct-per- 
lesson  is  appropriate 

The  preceding  comments  were  meant  to  motivate  the  pragmatic  utility  of  manner 
constraints  in  general  and  one-disjunct-per-lesson  in  particular  by  considering  how 
one-disiunct-per-lesson  helps  solve  the  disjunction  problem,  one  of  the  inherent 
problems  of  induction.  Later,  a  similar  motivation  will  be  presented  for  the  show-work 
manner  constraint,  based  on  another  inherent  problem  of  induction  the  invisible  obiects 
problem.  Manner  constraints  are  a  new  technique  for  solving  inherent  problems  m 

induction  Previous  At  learners  have  employed  either  absolute  biases  or  relative 
biases  Although  any  addition  to  Al  s  toolkit  of  techniques  is  welcome  manner 

constraints  seem  oarticularly  welcome,  for  they  are  remarkably  general  as  the 
preceding  discussion  argued  and  they  are  quite  effective  in  reducing  the  complexity  of 
programs  'or  learning  procedures,  as  the  remainder  of  this  article  snows 


4.  Subprocedures 

in  order  to  make  one-disiunct-per-iesson  easy  to  implement,  the  aog  representation 
permits  disjunctions  m  only  one  place,  the  on  goals.  Thus,  a  disjunct  is  an  on  goal  s 
rule  plus,  roughly  speaking  whatever  that  rule  calls  Such  fragments  of  acgs  are 
called  suDproceaures.  A  subprocedure  consists  of  several  components; 

1  A  new  on  rule  that  is  placed  beneath  an  existing  on  goal.  The  existing 
on  goal  is  called  the  parent 

2  A  new  and  goal,  which  is  called  by  the  new  on  rule 

3.  The  new  and  has  one  or  more  rules.  Each  rule  calls  a  new  on  goal  that 
has  iust  one  rule  These  ons  are  merely  a  convenience.  They  provide  a 
place  for  later  subprocedures  to  attach 

4.  Each  such  on  has  a  single  rule  that  calls  some  existing  and  goal.  These 
existing  and  goals  are  called  the  «/ds 

Figure  4-1  illustrates  these  components  of  a  subprocedure  by  showing  an  aog  before 
and  after  a  Subprocedure  has  been  added  This  subprocedure  was  acquired  from  a 
lesson  that  teaches  how  to  borrow  across  zeros.  The  pre-lesson  aog  (figure  4-ia)  can 
borrow  only  from  non-zero  digits,  the  post-lesson  aog  (figure  4- 1b:  which  is  the  same 
as  figure  2-1)  can  borrow  across  zeros  BORROw/from  is  the  subprocedure  s  parent 
The  new  on  rule  connects  boppOafrom  to  bfz.  The  new  and  is  bfz  The  kids  are 
regroup  and  ovn/.R'r  •  bfz  and  2'Bfz  are  the  new  or  goals  that  are  added  as 
places  to  attach  future  subprocedures. 

One-disiunct-per-iesson  takes  us  as  a  long  way  towards  solving  the  whole  procedure 
induction  problem  Inducing  a  procedure  >s  reduced  to  a  senes  of  subprocedure 
induction  problems  one  per  lesson  A  subprocedure  induction  problem  reduces  to 
three  subproblems 

■  Skeletons'  Skeleton  induction  determines  the  oarent  and  the  kids  of  the 

new  subprocedure  This  establishes  the  toDoiogy  of  the  new  suborocedure 
(i  e  the  connectivity  of  the  goals  and  rules*  Because  it  doesn  t  determine 
the  conditions  and  action  arguments  of  the  new  rules  skeleton  induction  is 
like  inducing  only  the  bones  and  not  the  flesh  of  the  subproceaure 

•  Patterns  One-disiunct-per-iesson  entails  that  a  r.jie  s  conditions  nave  no 
disjunctions  This  means  that  they  can  oe  nouceo  bv  standard 
disjunction-free  cattern  nauction  techniques 
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•  Functions:  One-disjunct-per-iesson  entails  that  disjunctions  are  not  permitted 
m  the  nests  of  functions  that  express  the  data  flow  in  acgs  This  makes 
inducing  those  function  nests  easier  However  section  7  snows  that 
function  nest  induction  is  still  infeasible  unless  more  constraints  are  added 
The  show-work  felicity  condition  is  tne  key  to  Sierra  s  solution 

These  three  induction  tasks  will  be  discussed  serially  in  tne  following  sections  in 
Sierra,  skeleton  induction  happens  first  using  one  oass  over  the  lesson  s  examples. 
Pattern  and  function  induction  occur  together  on  a  second  pass  over  tne  examples. 

5.  Skeleton  induction 

To  see  what  skeleton  induction  involves,  a  computer  science  fixture  is  needed:  the 
trace  of  a  procedures  execution.  Figure  5-1  shows  the  trace  tree  for  a  correct 
subtraction  procedure  (the  procedure  is  shown  in  figures  2-1  and  4-1  b)  solving  a  bfz 
(i.e  .  borrow  from  zero)  problem.  Each  call  is  shown  as  a  tree  node,  with  its 
arguments  abbreviated  A  trace  tree  is  just  a  parse  tree  for  tne  action  sequence, 
using  the  procedure  as  the  grammar 

Roughly  speaking,  a  skeleton  is  a  hole  in  a  trace  tree,  if  the  procedure  is  missing 
the  bfz  goal,  then  the  trace  tree  would  have  a  hole  in  the  middle  of  it.  as  in  figure 

5-2  The  gap  is  right  where  the  bfz  node  would  be  From  the  figure,  one  can  see 

that  a  skeleton  can  be  characterized  by  the  link  coming  into  it  from  above  and  the 
links  leaving  it  from  below  Thus,  a  skeleton  is  uniquely  specified  by  the  parent  and 
the  kids 

Almost  all  action  sequences,  including  the  example  of  figure  5-2  admit  more  than 
one  skeleton  Most  of  the  ambiguity  is  due  to  the  fact  that  one  can  almost  always 
make  a  skeleton  bigger  The  kids  can  be  lower  in  the  tree  le  g  .  figure  5-3).  the 
parent  can  be  higher  (e  g  .  figure  5-4)  Any  node  that  would  complete  an  otherwise 
incomplete  trace  tree  is  a  legitimate  skeleton. 

Sierra  uses  two  context-free  grammar  parsing  algorithms  to  enumerate  the  skeletons 
A  too-down  recursive  descent  parser  is  used  to  find  all  possible  parents  tit  is  actually 
just  a  non-determimsiic  version  of  the  regular  aog  interpreter  i  A  bottom-up.  breadth- 

first  parser  is  used  to  find  all  possible  kids.  For  each  parent,  ail  oossibie  tuples  of 

kids  are  collected  where  a  kid  tuple  is  a  sequence  of  adiacent  kids  that  together 
span  the  same  part  of  the  action  sequence  as  the  parent  This  generates  the  set  of 
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ail  possible  skeletons  in  general  there  can  be  thousands  of  possible  skeletons  (eg. 

1 50  possible  parents)  x  (3  kios  per  kid  tuple,  on  average!  x  < 30  possible  kids  for  each 

tuple  position  on  average)  =  4500  possible  skeletons  )  Sierra  represents  this  set 
imoiicitlv  in  order  to  save  space 

Notice  that  there  would  be  many  more  possibilities  if  it  weren  t  for  one-disiunct-per- 
lesson  For  instance  if  two  new  subprocedures  were  allowed,  then  Sierra  would  have 

to  collect  ail  possible  pairs  of  parents,  each  with  their  possible  kids,  etc 

Parsing  Sierra  s  aogs  is  equivalent  to  parsing  attribute  grammars,  for  which  there  are 
many  algorithms  [46)  Such  parsing  is  quite  simple  because  the  language  is 
.  .jplicative  (i  e  .  no  side  effects,  no  assignment  statements).  Once  a  goal  s  arguments 
are  bound  by  its  caller  those  values  never  change  The  arguments  act  as  a 
subcategonzation  of  the  goal.  This  makes  parsing  nearly  as  simple  as  context-free 

l 

grammar  parsing  if  side-effects  were  allowed,  parsing  would  still  be  possible,  but  it 
would  be  combmatorially  costly,  because  the  left  (preceo  ng>  context  of  a  goal  would 
have  to  be  included  in  the  goal's  subcategorization. 

Bottom-up  parsing  requires  inverse  execution  of  aog  code.  Sierra  must  be  able  to 
figure  out  the  arguments  of  a  caller  from  the  arguments  of  its  cailee  s.  This  requires 
matching  fetch  patterns  "backwards"  and  inverse  evaluation  of  functions  The  first  is 
easy  Matching  patterns  backwards  is  the  same  as  matching  them  forwards. 
Backwards  evaluation  of  arithmetic  functions,  such  as  (Add  x  y),  is  accomplished  by 
hand-coded  inverse  functions  that  produce  sets  of  tuples  that  represent  possible  input 
values  This  technique  would  collapse  if  it  were  not  feasible  to  assume  that  examples 
only  use  small  numbers.  If  numbers  could  be  arbitrarily  large,  then  other  techniques 
(eg  symbolic  execution,  followed  by  solution  of  a  system  of  polynomial  equations) 
would  have  to  be  employed 

The  techniques  |ust  mentioned  generate  one  set  of  candidate  skeletons  per  action 
sequence  One-disjuncf-per-iesson  entails  that  skeleton  induction  can  be  performed 
simply  by  intersecting  these  sets  Because  a  lesson  may  introduce  just  one 
subprocedure  all  the  skeletons  parents  must  be  the  same  Because  the  new 

subprocedure  is  disiunction-free  each  skeleton  s  list  of  kids  must  be  equal  to  each 
other  skeleton  s  list  of  kids  in  particular  two  kid  tuples  a  3  .:  and  a  d  o  cannot  be 
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merged  by  using  disjunction  on  the  middle  kid  to  form  something  like  ,a  .or  3  o.  c>. 

Skeleton  intersection  is  powerful  enough  that  it  is  usually  possible  to  devise  a  lesson 
that  yields  a  unique  skeleton  when  its  examples’  skeletons  are  intersected  However, 
in  some  cases,  this  is  not  possible.  In  fact,  the  sfz  skeleton  that  is  our  running 
illustration  cannot  be  uniquely  specified  by  examples3  in  such  cases  there  is  no 
choice  but  to  guess.  This  means  that  relative  biases  for  skeletons  are  required 

Sierra  has  been  used  to  test  various  relative  biases  m  order  to  find  the  ones  that 
explain  the  skeleton  choices  the  people  make.  Sierra  first  generates  all  consistent 
skeletons  using  parsing  and  skeleton  intersection  In  manual  mode  it  displays  them  in 
a  menu  and  allows  the  user  to  choose  one  This  is  useful  for  exploration.  in 
automatic  mode.  Sierra  partiailv  orders  the  skeletons  using  the  biases  that  are  being 
tested.  Usually  there  is  just  one  skeleton  that  is  maximal  m  the  partial  order  if  so 
Sierra  just  takes  it  and  goes  on.  if  there  is  more  than  one.  Sierra  chooses  the  first 
one.  then  stores  the  learner  s  state  so  that  it  can  come  back  later  and  take  the  other 
choices  There  is  nothing  new  about  this  architecture  but  it  is  remarkable  how  easy 
it  makes  it  to  search  the  space  of  hypotheses  The  current  best  hypothesis  is  that 
people  choose  the  smallest,  most  deeply  embedded  skeleton  See  [44]  chapters  18 
and  19.  for  a  complete  discussion 

5  1  Prior  solutions  to  the  skeleton  induction  problem 
Skeleton  induction  determines  the  procedure  s  goai-subgoai  calling  hierarchy 

inducing  such  hierarchies  has  proved  to  be  a  tricky  prooiem  m  machine  learning 
Neves  [33]  used  hierarchical  examples  to  get  his  procedure  learner  to  build  hierarchy. 
However  subtraction  teachers  rarely  use  such  examples  Badre  [5]  recovered 
hierarchy  by  assuming  examples  are  accompanied  by  a  written  commentary.  Each 
instance  of  the  same  goal  is  assumed  to  be  accompanied  by  the  same  verb  <e  g 
"borrow')  This  is  a  somewhat  better  approximation  to  the  kind  of  input  that  students 
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actually  receive,  but  agam  it  rests  on  delicate  and  often  violated  assumptions.  Anzai 
and  Simon  [3)  used  production  compounding  (chunking)  to  build  hierarchy  However 
to  account  for  which  of  many  hierarchies  would  be  learned.  Anzai  used  domain-specific 
features,  such  as  the  pyramids  characteristic  of  subgoal  states  m  the  Tower  of  Hanoi 
puzzle.  Sierra  s  technique,  learning  by  completing  explanations,  is  less  domain  specific 
than  Anzai  and  Simon  s  technique,  and  requires  less  information  from  the  teacher  than 
Neves  and  Badre  s  technique 

6.  Pattern  induction 

The  rules  in  the  new  subprocedure  must  be  given  appropriate  conditions.  The  new 
or  rules  require  test  patterns,  and  the  new  and  rules  require  fetch  patterns  Because 
patterns  have  no  disjunctions  nor  other  representational  devices  that  trouble  induction 
patterns  can  be  induced  using  standard  techniques.  This  section  presents  the  ones 
that  Sierra  uses  Athough  pattern  induction  is  not  particularly  interesting  from  a 
theoretical  standpoint,  it  has  turned  out  to  be  the  bottleneck  in  Sierra  s  performance 

Parsing  an  action  sequence  with  the  new  subprocedure  installed  will  pair  the  new  or 
rule  with  a  problem  state  where  its  test  pattern  would  have  to  be  true  in  order  for  the 

parse  to  go  through  Such  states  are  positive  instances  for  the  test  pattern 

Similarly,  parsing  can  uncover  states  where  the  test  pattern  would  have  to  be  false  in 
order  for  the  parse  to  go  through  These  states  are  tne  negative  instances  The  test 
pattern  induction  problem  is  to  find  a  test  pattern  that  matches  all  the  positive 

instances  and  none  of  the  negative  instances  Parsing  also  collects  positive  instances 
for  the  fetch  patterns,  together  with  the  values  that  the  fetch  patterns  should  return 
The  fetch  pattern  induction  problem  is  to  find  a  pattern  that  matches  all  the  positive 
instances  and  returns  the  appropriate  values  each  time 

Both  induction  problems  are  solved  using  version  spaces  (29)  A  version  space  is 
represented  by  a  pair  <S.G>  where  S  is  the  set  of  maximally  specific  patterns 

consistent  with  the  instances  received  so  far  and  G  is  the  set  of  maximally  general 
patterns  consistent  with  the  instances  To  use  version  spaces  several  application- 
specific  functions  must  be  defined  The  most  important  two  are  Uodate-S  and 

Uodate-G  Given  a  version  space  and  a  positive  instance  Update-S  generalizes  the 

patterns  m  S  so  that  they  match  the  instance  ana  remain  maximally  specific  To 


implement  Update-S  Sierra  uses  an  algorithm  mat  finds  the  largest  common  subgraph 
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of  two  labelled  directed  graphs  9  Given  a  version  space  and  a  negative  instance. 
Update-G  augments  the  patterns  m  G  so  that  they  do  not  match  the  negative  instance 
Update-G  is  implemented  with  an  algorithm  for  generating  minimal  covers  of  a  set'0 

Sierra  is  designed  so  that  pattern  induction  never  finishes  A  rule  keeps  the  version 

space  of  its  pattern  so  that  induction  can  continue  whenever  the  pattern  requires  more 

refinement.  It  is  almost  always  the  case  that  introducing  a  new  subprocedure  will 
cause  the  patterns  in  older  subprocedures  to  be  modified.  Those  older  patterns  will 
be  "seeing"  problem  states  that  they  have  never  "seen"  before  namely  the  ones  that 
trigger  the  new  subprocedure  and  the  ones  that  the  new  subprocedure  produces,  in 
order  to  continue  to  function  properly,  the  older  patterns  must  be  generalized  to  match 
these  new  states  The  generalization  of  older  patterns  to  match  new  situations  is  a 
simple  form  of  explanation-based  learning,  as  the  term  is  used  by  some  authors 
(c  f  [11]) 

Since  pattern  induction  occurs  so  often,  it  needs  to  be  fairly  efficient.  However,  the 
Update-S  computation  is  NP-hard.  and  the  Update-G  routine  calls  it  as  a  subroutine. 
More  specifically,  if  pattern  P  has  n  variables,  and  it  is  matched  against  a  problem 

state  with  m  objects,  then  Update-S  takes  0(mn).  These  combinatorics  reflect  the 

usual  Al  matching  problem:  Each  variable  in  P  can  be  paired  with  any  ob/ec t  in  the 
problem  state 

One  way  to  deal  with  the  complexity  of  pattern  induction  is  to  use  a  small  n  and/or 
m.  For  instance,  blocks-world  inducers  (e  g  .  [12.  48|)  typically  have  an  n  and  m  of 
less  than  5.  Using  a  small  n  and  m  is  impossible  in  Sierra  s  case.  Its  task  domain 


Q 

Siena  s  patterns  are  coniuctions  ot  laterals.  where  a  iitierai  >s  a  predica'e  ?•  a  negated  predicate 
Sucn  patterns  are  similar  to  abe'ied  directed  grapns.  vnere  me  vanaoies  are  -’caes  ana  me  relations 

are  labelled  arcs,  't  ail  me  predicates  m  a  pattern  are  p marv.  me  correspondence  'S  exact 

,0Given  a  g  trc-m  G  mat  matches  me  negative  instance  V.  and  an  s  trom  S  tnat  deesn  t  maten  N 
update-G  needs  to  ‘md  relations  m  s  to  add  to  g  such  mat  the  revised  g  deesr- t  maten  n  jpdate-G 
mst  generates  an  possible  mappings  ot  the  variables  cl  s  nto  the  variables  o  N  Each  $uen  mapping 
becomes  an  element  ot  the  set  ot  maps.  M.  to  be  covered  Each  relation  mss  pane d  v  m  me  subset 
of  M  that  contains  the  maps  under  which  the  relation  is  no;  a  memoe  a’  -N  fhe  relation  s  sad  'o 

■ever’  that  subset  ot  M  A  kev  'act  >s  mat  a  conjunction  ot  relations  cove's  at  east  'he  cmv’  ot  'heir 
i  divduai  covers  o'  M  The  mam  goal  is  to  'md  the  smallest  conjunction  :t  -e rations  mat  covers  an  ot 
VI  Sucn  a  conjunction  is  not  a  member  of  N  under  anv  mao  so  adding  t  to  q  vni  vieio  a  pattern  mal 
doesn  t  mater  N  which  is  wnat  we  wani  So  the  algorithm  for  'mo-ng  m mma'  covers  o'  M  vieios 

tandidates  tor  me  revised  g  Some  at  mese  candidates  nav  generalize  rmers  so  uooateG  nas  one 

more  task,  /mien  is  to  titter  out  the  candidates  mar  are  hot  maximaitv  general 
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requires  proolem  states  with  10  to  50  objects  The  larger  patterns  m  the  version 
spaces  have  about  the  same  numper  of  variables  as  obiects  m  the  problem  states 

thev  were  induced  from  The  combinatorics  for  straightforward  pattern  inducer  can  go 
as  nigh  as  50D° 

A  second  solution  is  to  impose  constraints  on  which  variable-object  mappings  will  be 
considered  Sierra  uses  two  constraints  First,  pattern  variables  have  an  implicit 

inequality  relationship  between  them  That  is.  distinct  variables  must  match  distinct 

obiects  'his  lowers  the  combinatorics  to  the  binomial  coefficient  faction  OinV[(n- 
mi'm'))  Second  the  patterns  and  problem  states  are  split  into  two  components  a 

part-whole  tree  and  the  rest  In  a  problem  state,  the  part-whole  tree  is  simply  the 
usual  parse  tree  for  the  mathematical  notation  For  instance,  a  subtraction  problem  s 
parts  are  its  columns  and  a  column  s  parts  are  its  digit  in  patterns  there  are 

variables  for  each  of  the  components  n  e  a  variable  for  the  problem  for  each  column 

and  for  each  digit),  and  their  part-whole  relationships  are  kept  separately  from  the 

mam  pattern  Pattern  induction  /and  pattern  matching,  too)  considers  only  variable- 
object  mappings  that  do  not  violate  the  tree  topologies  1  This  cuts  the  complexity 
down  to  Q<B!io9bO.  where  B  is  the  branching  factor  of  the  part-whole  trees  typically 
about  three  When  this  constraint  is  turned  off  m  Sierra  an  Update-S  that  normally 

takes  10  seconds  takes  hours.  This  constraint  or  something  like  it.  is  a  practical 
necessity  Even  with  it  most  of  Sierra  s  time  is  spent  running  the  pattern  induction 
algorithms  2  in  a  typical  run.  about  700/o  of  the  time  is  spent  doing  pattern 
inductions 
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7.  Function  induction 

Some  of  the  rules  m  the  new  subprocedure  may  require  function  nests  to  be 
induced  for  their  actions.  Functions  are  used  to  represent  number  facts,  such  as 
(Sub  17  8)  =  9.  A  typica.  function  next  is  (Sub  (Add  7  10)  8)  This  section  discuss 
how  function  nests  can  be  learned.  This  learning  task  is  called  function  induction 
Function  induction  involves  discovering  which  function  or  nest  of  functions  will  yield  the 
numbers  shown  m  the  examples.  For  instance,  suppose  the  learner  already  knows 
how  to  do  single  column  subtraction  problems,  and  it  is  taking  a  lesson  on  two-column 
subtraction  After  seeing  examples  a  and  d. 

a.  7  2  b.  7  4 

-  4  1  -  2  1 

3  1  5  3 

there  are  many  function  nests  that  explain  where  the  tens  column  answer  comes  from 
Here  are  three  candidates: 

1  A10  =  T'0"®10 

2.  A10  =  T,  +  3, 

3.  A,q  =  ((T,q ♦  T 1  )-(B , q  +  B , ))-A , 

where  the  subscripts  indicate  the  column,  and  T.  B  and  A  stand  for  the  top.  bottom 
and  answer  The  first  generalization  is  the  correct  one  The  second  generalization  is 
that  the  tens  answer  is  the  sum  of  the  units  columns'  digits  This  second 
generalization,  although  consistent  with  examples  a  and  b.  is  inconsistent  with  c: 

c.  36 
-  1  2 
2  4 

Many  such  accidental  generalizations  can  be  eliminated  by  giving  lots  of  examples 
However  generalization  3  can  never  be  eliminated  that  way  It  is  true  of  any 
subtraction  problem  This  may  seem  like  a  peculiarity  of  this  case,  but  it  isn  t  There 
are  infinitely  many  polynomials  consistent  with  any  finite  set  of  input-output  number 
tuples  i in  particular,  there  are  infinitely  many  n-degree  polynomials  consistent  with  any 
n  points  i 

The  underlying  problem  has  nothing  to  do  with  the  fact  that  functions  and 
ooiynomials  are  the  representation  language  of  generalization  Any  functional 


expression  can  be^  easily  converted  to  a  relational  one  For  instance  generalization  3 
aoove  could  be  expressed  as 


(AND  (PLUS  X  T.„  T.) 

(PLUS  Y  B . I  B.) 
(PLUS  Z  A.;  A,) 
(MINUS  Z  H) 
(INVISIBLE  X) 
(INVISIBLE  Y) 
(INVISIBLE  Z)) 


where  iPLUS  u  v  w>  means  u  =  v  +  w.  The  special  relation  invisible  is  needed  because 
X  Y  and  Z  do  not  match  any  of  the  visible  objects  in  the  examples  Under  normal 
confirmation  conventions  for  relational  descriptions  [19],  the  variables  match  only  visible 
obiects.  so  variables  that  designate  invisible  objects  must  be  specially  marked,  and 
that  is  what  'nviSiEle  does 


Looked  at  this  way.  the  underlying  induction  problem  is  clear  if  the  representation 
allows  invisible  ob|ect  designators  then  there  will  always  be  far  too  many 
generalizations  consistent  with  any  finite  set  of  examples.  Some  constraint  must  be 
placed  on  the  use  of  invisible  objects  in  examples.  This  induction  problem  will  be 
called  the  invisible  objects  problem 


7  1  Prior  solutions  to  the  invisible  obiects  problem 
Al's  most  common  solution  to  the  invisible  obiects  problem  is  to  ban  invisible  obiect 
designators  from  the  representation.  For  instance.  Winston  s  blocks  world 
representation  language  [48j  could  have  employed  an  elegant  expression  of  the  arch 
concept  if  it  allowed  invisible  objects: 

(AND  (ISA  LINTEL  'PRISM) 

(ISA  LEG1  'BRICK) 

(ISA  LEG2  'BRICK) 

(ISA  GAP  'BRICK) 

(INVISIBLE  GAP) 

(SUPPORTS  LEG1  LINTEL) 

(SUPPORTS  GAP  LINTEL) 

(SUPPORTS  LEG2  LINTEL) 

(ABUTTS  LEG1  GAP) 

(ABUTTS  GAP  LEG2 ) ) 


fois  savs  that  the  imtei  rests  on  three  abutting  bricks,  and  the  middle  one  :s  invisible 
Using  a  aitferenriy  snaoed  invisible  blocks  for  the  gap  is  a  simple  way  to  describe 
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pyramidal,  trapezoidal  and  circular  arches  as  well  as  the  rectangular  arch  above 

However,  the  invisible  obiects  problem  makes  it  impossible  to  induce  such  descriptions 

Once  invisible  blocks  are  allowed  they  could  be  anywhere  The  inducer  would  nave 
no  way  of  knowing  whether  there  was  just  one  invisible  block  the  gap  or  dozens  lying 

around  all  jumbled  up  Winston  avoids  the  problem  by  omitting  invisible  obiect 

designators  from  the  representation.  and  employing  the  relationship 
nct  touching  leg  i  -EG2  to  express  the  gap  between  the  arch  s  legs 

Banning  invisible  object  designators  is  one  way  to  solve  the  invisible  objects  problem 
But  it  won  t  work  m  the  mathematical  domain,  invisible  object  designators  are  needed 
for  representing  procedures  such  as  multi-addend  addition,  in  the  problem  1^3  + 5  =  9. 
the  intermediate  result,  either  a  4  8  or  6,  is  invisible. 

As  mentioned  earlier,  absolute  and  relative  biases  are  the  two  customary  ways  to 
succeed  at  induction  An  absolute  bias,  banning  invisible  objects,  was  just  discussed 
For  a  relative  bias,  the  obvious  candidate  is  to  prefer  generalizations  with  the  fewest 
invisible  objects.  This  is  roughly  what  3ACON3  does  [24]  it  induces  physical  laws 
given  tables  of  idealized  experimental  data.  For  instance,  it  can  induce  the  general 
law  for  ideal  gases  when  it  is  given  "experiments"  such  as  this  one 

(AND  (MOLES  1.0) 

(TEMPERATURE  300.0) 

(PRESSURE  300000.0) 

(VOLUME  0.008320)) 

This  formal  representation  describes  the  experiment  in  the  same  way  that  Winston  s 
representation  described  a  scene  m  the  blocks  world  (this  is  not  the  representation 
that  SACON3  uses,  by  the  way).  The  expression  above  says  that  there  is  one  mole  of 
gas  at  a  certain  temperature  and  pressure  occupying  a  certain  volume  The  goal  of 
bacon 3  is  to  find  a  description  that  is  a  generalization  of  the  experiments  that  it  is 
given  For  experiments  of  this  type  the  generalization  that  it  induces  is 


(AND  (MOLES  N) 

(TEMPERATURE  T) 

(PRESSURE  P) 

(VOLUME  V) 

(TIMES  XI  P  V) 

(INVISIBLE  XI) 

(TIMES  X2  N  T) 

(INVISIBLE  X2 ) 

(QUOTIENT  X3  XI  X2) 

(INVISIBLE  X3) 

(CONSTANT  X3)) 

That  is  PV/NT  is  a  constant  This  is  one  way  to  express  the  ideal  gas  law.  which  is 
more  widely  known  as  pV  =  nRT  where  R  =  8.32.  The  intermediate  results  PV.  NT  and 
PV/NT  do  not  appear  m  the  "scene"  described  earlier.  This  is  what  makes  bacon3 s 
job  hard  9ACON3  s  method  for  solving  this  induction  problem  is.  very  roughly 
speaking,  to  guess  useful  invisible  objects  descriptors  and  enter  their  values  in  the 
scenes  It  might  start  by  forming  all  binary  function  on  the  visible  objects,  e  g.,  NT 
P-V.  PP  P/T.  etc.  Since  none  of  these  yield  values  (invisible  objects)  that  are 
constant  across  all  the  scenes,  it  trys  further  compositions:  NT/PV.  NT  +  V  NTPV.  etc 
At  this  level  it  succeeds,  since  PV/NT  turns  out  to  be  the  same  value.  8  32.  in  all  the 
scenes  Essentially  baccn3  solves  the  invisible  object  problem  by  choosing  a 
generalization  with  a  minimal  number  of  invisible  object  designators. 

BACON3  is  not  an  incremental  inducer  it  assumes  that  it  has  the  total  example  set 
at  the  beginning  There  is  a  reason  for  this.  Any  inducer  that  seeks  a  generalization 
with  the  fewest  invisible  obiect  designators  would  clearly  want  to  entertain 
generalizations  with  N  -r  i  invisible  object  designators  only  after  it  had  disconfirmed  all 
the  generalizations  with  N  invisible  object  designators  However,  adding  an  invisible 
obiect  designator  to  a  disconfirmed  generalization  won  t  help  it  a  bit.  That  is  if  f(g(x)) 
doesn  t  match  a  certain  example,  then  wrapping  an  hi-)  around  it  won  :  help  3  This 
means  that  failure  at  the  level  of  N  invisible  obiects  doesn't  tell  one  anything  about 
what  generalizations  to  use  at  the  N 1  level  If  the  inducer  is  incremental,  and  it  is 
at  the  N:r  level,  and  the  Min  example  exhausts  the  level  then  the  inducer  must  start 
over  at  me  N  *  i  level  and  re-examine  all  M  examples  It  would  be  better  off  just 
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waiting  until  the  teacher  told  it  that  all  the  examples  were  presented,  then  do  a  non- 
incrementai  induction,  This  would  require  a  manner  constraint.  The  teacher  would 
have  to  mark  the  example  presentation,  and  the  learner  would  have  to  understand 
such  marks  as  indicating  that  it  was  okay  to  begin  non-mcremental  induction.  As  it 
turns  out.  naturally  occurmg  mathematical  curricula  do  employ  a  manner  constraint,  but 
it  is  not  the  one  just  mentioned.  The  next  subsection  describes  the  one  that  actually 
occurs. 

7  2.  The  show-work  felicity  condition 

in  almost  all  cases,  textbooks  do  not  require  the  student  to  do  invisible  object 
induction,  instead,  whenever  the  text  needs  to  introduce  a  subskill  that  has  a  mentally 
held  intermediate  result,  it  uses  two  lessons.  The  first  introduces  the  subskill  using 
special,  ad  hoc  notations  to  indicate  the  intermediate  results.  Figures  7-1  and  7-2 
show  some  examples.  Since  the  intermediate  results  are  written  out  in  the  first 
lesson,  the  students  need  guess  no  invisible  objects  in  order  to  acquire  the  subskill. 
The  learning  of  this  lesson  may  proceed  as  if  invisible  object  designators  wera  banned 
from  the  representation  language. 

The  second  lesson  teaches  the  subskill  again,  without  writing  the  intermediate  results. 
The  second  lesson  is  almost  always  headed  by  the  key  phrase.  "Here  is  a  shorter 
way  to  X"  where  X  is  the  name  of  the  skill.  The  students  are  being  instructed  that 
they  will  be  doing  exactly  the  same  work  (i.e.,  the  path  of  fact  functions  is  the  same). 
They  are  left  with  the  relatively  simple  problem  of  figuring  out  how  the  new  material 
relates  to  the  material  they  learned  in  the  preceding  lesson.  This  kind  of  learning  is 
a  kind  of  optimization.  They  learn  how  to  do  the  same  work  with  less  writing  So.  the 
normal  lessons  are  'show  work"  lessons:  the  learner  does  invisible  object-free 
induction  The  marked  lessons  are  "hide  work"  lessons:  the  learner  does  optimization 
learning.  The  felicity  condition  is  called  "show-work  /  hide-work"  or  just  show-work  for 
brevity. 

7  3.  Parallels  between  the  invisible  objects  problem  and  the  disjunction  problem 

The  invisible  ob|ect  problem  and  the  disjunction  problem  are  similar  m  many 
respects.  (1)  Both  the  invisible  obiects  problem  and  the  disjunction  problem  are 

impossible  to  solve  using  unbiased  induction  if  the  class  of  all  possible 

generalizations  allows  free  use  of  them  then  there  are  far  too  many  generalizations 
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consistent  with  any  finite  set  of  examples  Hence  Doth  the  disjunction  problem  and 
the  invisible  object  problem  require  extra  constraints  (2)  Both  can  be  solved  trivially 
with  absolute  biases:  bar  their  representational  devices  from  the  representation 
language  This  is  not  a  viable  option  in  the  mathematics  domain  because  the  target 
procedures  use  both  disjunctions  and  invisible  objects.  (3)  In  both  cases  a  relative 
bias  that  works  is  to  minimize  the  use  of  the  respective  devices  ne.,  to  prefer 
generalizations  with  the  fewest  disjuncts  and  the  fewest  invisible  object  designators)  (4) 
For  empirical  reasons,  minimization  biases  are  not  included  in  the  theory  More 
accurate  hypotheses  are  based  on  felicity  conditions,  i.e..  tacitly  held  manner 
constraints. 

8.  Task  generality 

Subtraction  is  the  only  task  for  which  I  have  an  extensive  data  from  human  learners, 
so  Sierra  was  developed  primarily  to  simulate  the  learning  of  subtraction  in  order  to 
make  a  rough  assessment  of  its  task  generality.  Sierra  was  given  lesson  sequences  for 
three  new  task  domains,  whose  lesson  sequences  were  drawn  from  a  popular 
elementary  mathematics  textbook  (13): 

1.  Addition  of  multi-column,  multi-addend  problems,  such  as 

3  0  7 
8  1 

+  620 

2  Multiplication  of  multi-column  multiplication  problems,  such  as 

3  0  7 
x  2  5 

3  Sixtn  grade  algebra  The  skill  is  to  solve  linear  equations  with  one 
occurrence  of  one  unknown,  with  natural  number  solutions  At  the  end  of 
the  sixth  grade  students  are  expected  to  be  able  to  solve  5(3x+l)  =  20  but 
not  3x  +  2x=l0  or  5(  3x  1 )  =  9 

Sierra  learned  correct  procedures  for  all  three  skills,  although  there  are  some 
caveats  to  this  assertion  that  will  be  discussed  in  a  moment  Rather  than  go  through 
Sierra  s  learning  m  detail,  this  section  describes  the  difficulties  encountered  and  the 
kinds  of  revisions  that  would  be  required  resolve  them 

One  problem  is  that  a  fcfeach  goal  type  is  needed  Given  a  sequence  of  objects 


of  the  same  type  (e.g.,  a  sequence  of  columns),  a  foreach  would  execute  a  subgoal 
on  each  object.  This  new  goal  type  is  needed  because  some  naturally  occurmg 
lesson  sequences  are  not  quite  right  for  learning  the  tail-recursions  that  Sierra  currently 
uses  to  implement  loops  Although  the  lesson  sequences  can  be  easily  modified  that 
would  harm  Sierra  s  cognitive  fidelity.  To  keep  Sierra  empirically  accurate  a  new 
relative  bias  towards  iteration  is  needed  fdr  skeleton  induction  This  bias  is  most 
effectively  implemented  by  including  the  foreach  goal  type  in  the  representation 
language. 

A  problem  was  discovered  during  the  last  multiplication  lesson.  At  the  time  of  the 
lesson,  the  learner  can  solve  single-digit  multiplier  problems,  such  as  (a)  below 

3 

a.  5  7  b.  5  7 

x_5  ?U_5 

2  8  5  2  8  5 

*3.  7,0 

8  5  5 

Note  the  use  of  the  scratch  marks  to  indicate  carrying.  The  lesson  teaches  how  to 
do  two-digit  multiplier  problems,  such  as  (b)  above.  The  addition  subproblem  presents 
no  difficulties  for  Sierra,  because  the  initial  knowledge  state  for  multiplication  includes 
a  multicolumn  addition  procedure  that  was  learned  from  the  addition  lesson  sequence 

The  problem  with  the  lesson  is  that  it  does  not  use  scratch  marks  to  indicate 
carrying  This  causes  two  difficulties  for  Sierra.  First,  the  textbook  does  not  include 
an  optimization  lesson  to  teach  how  to  suppress  carry  marks  (using  examples  such  as 
(c)  below). 

c.  5  7 

x _ 5 

2  8  5 

Perhaps  teachers  present  such  examples  on  their  own  initiative,  without  the  guidance 
of  a  textbook  lesson  A  more  serious  problem  is  that  even  if  there  were  an 
optimization  lesson,  there  is  no  easy  way  to  modify  the  multiplication  procedure  m 
order  to  suppress  the  scratch  marks.  The  procedure  has  a  loop,  which  iterates 
leftward  through  the  top  row.  multiplying  the  row  s  digits  by  the  single-digit  multiplier 
A  carried  digit  is  written  during  one  invocation  of  the  loop  body  and  read  during 
another  invocation  of  the  loop  body  (the  next  one  m  fact)  For  this  dataflow  to 
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happen  appiicatively  without  writing  on  the  page  the  second  invocation  must  somehow 
be  called  from  the  first  invocation  There  is  no  way  to  achieve  this  without  a 

complete  overhaul  of  the  procedure  s  calling  structure  Sierra  cannot  do  such  a  major 
overhaul  I'm  not  sure  what  students  do  Decause  i  have  no  data  for  multiplication 
However  I  suspect  that  they  do  not  radically  overhaul  their  understanding  of  the 
procedure  just  to  suppress  scratch  marks  i  suspect  that  they  use  their  fingers  to 
hold  the  carried  digit  between  multiplies  or  they  use  some  snort-term  memory  resource 
to  do  so  implementing  the  latter  possibility  would  entail  making  the  representation 
language  non-applicative.  which  would  make  parsing  much  more  complex  On  the 
other  hand,  if  students  use  their  fingers  then  the  hypotheses  and  representation  can 
remain  intact,  but  the  representation  of  the  state  of  a  multiplication  problem  would 
have  to  have  a  "hand"  added  to  it  Either  modification  would  require  significant 

enough  programming  that  I  simply  stopped  Sierra  s  traversal  of  the  multiplication 
sequence  at  this  lesson  It  was  the  last  one.  anyhow 

The  most  significant  task  dependencies  concerned  notational  syntax  This  is  not 
surprising,  since  the  four  tasks  employ  quite  different  notations.  Sierra  s  treatment  of 
problem  states  and  their  syntax  has  not  yet  been  discussed  in  this  article  (see  [44]). 
The  basic  idea,  however,  is  simple  to  present  Problem  states  are  represented  as 
letters,  digits  and  lines  situated  on  a  Cartesian  plane,  and  a  two-dimensional  context- 
free  grammar  is  used  to  parse  them  This  technique  failed  in  some  cases  For 
instance,  given  the  expression  "5-x”.  the  minus  sign  must  be  viewed  two  ways:  as  a 
prefix  for  the  term  following  it  and  as  an  infix  operator  that  separates  the  two  terms. 
So  there  are  two  parse  trees  for  "5-x,"  four  for  "2  +  (5-x)."  and  so  on  Sierras 
context  free  grammar  technique  is  combinatoriatly  explosive  A  better  solution  would 
be  to  redesign  the  parser  and  pattern  matcher  so  that  they  keep  local  ambiguities 
local  This  might,  in  fact,  be  the  first  step  toward  an  interesting  theory  of  the 
interpretation  of  mathematical  notation 

To  sum  up  there  were  two  mam  difficulties  in  getting  Sierra  to  learn  other  skills 
than  subtraction  d)  The  dataflow  architecture  is  incomplete  Some  globally  bound 
resource  le  g  fingers,  short-term  memory)  is  needed  to  do  carrying  without  scratch 
marks  (2)  The  notational  grammars  are  not  quite  expressive  enough  People  seem 
to  view  the  same  problem  state  several  wavs  a  facility  that  Sierra  s  grammar  system 
does  not  adequately  support 
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9.  Concluding  remarks 

Until  now  it  seemed  that  to  be  successful,  an  inducer  had  to  use  either  absolute  or 
relative  biases  To  put  it  differently  successful  inducers  have  either  been  partially 
blind  or  strongly  preiudtced  Sierra  is  a  demonstration  that  there  is  a  third  way  An 
inducer  can  be  successful  if  it  receives  a  well-structured  example  sequence  whose 
structure  it  understands.  That  is  the  example  sequence  obeys  certain  manner 
constraints  and  thereby  encodes  information  about  the  structure  of  the  target 
generalization  The  learner  takes  advantage  of  these  constraints  m  order  to  recover 
the  "message"  that  is  encoded  m  the  form  of  the  example  sequence  in  human 
learning  situations,  if  neither  teacher  nor  learner  are  aware  of  the  manner  constraints 
then  they  warrant  the  name  felicity  conditions  So  a  successful  inducer  is  either 

partially  blind,  strongly  prejudiced,  and/or  felicitiously  taught 

9  1  Some  speculations  on  manner  constraints  and  felicity  conditions 

in  a  certain  sense,  manner  constraints  may  be  optimal  strategies  for  knowledge 

communication.  For  instance,  in  order  to  solve  the  learner  s  disjunction  problem,  the 
teacher  s  optimal  strategy  would  be  to  point  to  a  node  in  the  learner  s  knowledge 
structure  and  say  "disjoin  that  node  with  the  following  subprocedure:  .  .  ."  Clearly, 
this  is  impossible  So  the  teacher  says  the  next  best  thing.  "Disjoin  some  node  with 
the  following  subprocedure:  .  .  ."  The  learner  must  figure  out  which  node  to  disjoin 
because  the  teacher  can't  point  to  it.  But  the  learner  now  knows  that  some 

disjunction  is  necessary  and  that  the  examples  following  the  teacher  s  command  will 

determine  its  contents.  If  it  were  not  for  the  exigencies  of  school  scheduling,  this 
would  be  perhaps  the  optimal  information  transmission  strategy  However,  lessons 

must  be  about  an  hour  long  This  means  that  only  some  of  the  lesson  boundaries 

will  correspond  to  the  teacher  s  command  to  start  a  new  disjunction  The  other 

lessons  will  finish  up  a  previous  lesson  in  short,  the  optimal,  feasible  manner 

constraint  for  disjunctive  information  transmission  could  well  be  one-disjuct-per-lesson. 

There  is  a  great  deal  of  complaining  about  the  so-ca lied  knowledge  acquisition 
bottleneck  m  developing  expert  systems  It  seems  to  be  quite  difficult  to  get  human 
exoerts  to  formalize  their  expertise  as  eg.  production  rules  One  often  heard  solulion 
s  to  nave  the  system  learn  the  knowledge  on  its  own  eg  by  discovery  or  by 
analogy  ie  g  [23.  28])  I  tend  to  agree  with  Simon  |42],  who  predicts  that 
programming  will  always  be  the  most  effective  way  to  'educate'  computer  However 
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if  Simon  and  I  are  wrong,  and  macnme  learning  does  hold  promise  as  a  solution  to 
the  knowledge  acquisition  bottleneck,  then  examining  now  human  experts  acquire  their 
knowledge  is  a  good  researcn  heuristic  Even  a  cursory  examination  shows  that  most 
human  experts  didn  t  discover  their  knowledge  or  infer  it  they  learned  it  from  a 
mentor,  either  in  school  or  as  an  apprentice  '  l  A  good  mentor  is  careful  about 
selecting  tasks  for  the  student  that  are  appropriate  for  the  student  s  current  state  of 
knowledge  That  is.  the  instruction  is  not  a  randomly  ordered  sequence  of  tasks,  but 
a  carefully  structured  one  Vet  few  researchers  are  trying  to  ge’  an  expert  systems  to 
learn  from  the  kinds  of  structured  instruction  that  human  experts  receive.  Such  a 
system  would  take  advantage  of  the  format  that  its  mentor  places  on  the  instruction. 
The  present  research,  in  its  explication  of  felicity  conditions,  should  be  helpful  in 
building  such  a  knowledge  acquisition  system  Such  a  system  will  be  easier  for 
human  experts  to  educate  than  present  systems  because  the  experts,  many  of  whom 
are  experienced  teachers,  are  more  familiar  with  formatting  their  knowledge  as  lesson 
sequences  than  as  production  rules 

9.2.  Summary 

Putting  speculation  aside  I'll  review  the  techniques  that  have  been  presented.  The 
formost  is  one-disjunct-per-lesson  Because  of  it.  Sierra  s  design  is  quite  simple.  The 
procedure  induction  problem  is  reduced  to  a  series  of  subprocedure  induction 
problems,  one  per  lesson  Subprocedure  induction  reduces  to  three  subproblems: 
skeleton  induction,  pattern  induction  and  function  induction. 

Skeleton  induction  is  performed  by  parsing  the  action  sequences  top-down  as  far  as 
possible  and  bottom-up  as  far  as  possible.  in  neither  case  will  parsing  yield  a 
complete  parse  tree  To  complete  the  parse  tree  a  new  piece  of  tree  structure  must 
be  built  to  connect  the  top-down  parse  to  the  bottom-up  parse  Any  such  structure  is 
a  candidate  skeleton  This  dual-parser  calculation  is  done  on  each  example  in  the 
lesson  yielding  one  set  of  skeletons  per  example.  These  sets  are  intersected,  yielding 
the  skeieton(s)  that  are  consistent  with  all  the  examples  in  the  lesson. 


,lc*oe'’  ofogfa^mefs  sarrcuianv  j»aer  :nes  are  nofo^ous  e*':eof|ons  Vtanv  fhem  aeauireci 
^voer*'se  -  usi'i^tior  Dfr^acs  *h  s  across  "h  *^e  'laesoreaa  nrwtti  n  *ne  kno.vteoge 

enrj.reer,f-g  :  ".sviT'unir/  t.jt  domain  e*Oerf<>  Derate  *=**i:erfS  -vi»noi»!  >n  ;truc’»on. 


Pattern  induction  is  performed  by  a  standard  technique.  Mitchell's  version  space 
algorithm  [29] 

Function  induction  can  employ  a  brute  force  generate-and-test  algorithm  because 
there  is  a  manner  constraint  that  simplifies  the  problem.  The  show-work  constraint 
says  that  examples  must  "show  all  the  work"  when  introducing  an  new  subprocedure. 
That  is.  intermediate  results  must  be  written  on  the  example  where  the  learner  can 
"see"  them.  Because  a  composition  (nest)  of  two  primitive  functions  has  an 
intermediate  result  that  is  not  written  down,  composite  functions  can  not  be  introduced 
during  a  normal  lesson.  Consequently,  the  learner  only  has  to  consider  primitive 
functions  and  not  compositions  of  functions  when  it  does  function  induction  The  show 
work  constraint  makes  function  induction  almost  trivial. 

9  3.  Discussion 

The  "per  lesson"  part  of  one-disiunct-per-lesson  is  slightly  misleading.  It  turns  out 
that  Sierra  could  get  along  just  fine  without  lessons.  The  skeleton  induction  algorithm 
is  the  one  component  that  controls  the  introduction  of  disjunctions.  The  algorithm  fails 
only  if  an  example  introduces  two  or  more  disjuncts.  In  particular  just  by  omitting 
skeleton  intersection.  Sierra  could  get  along  fine  without  lesson  boundaries  Its  two 
manner  constraints  would  become  (1)  at  most  one  disjunct  per  example,  and  (2)  an 
example  is  induced  without  invisible  object  designators  unless  it  is  marked  as  a  same- 
work  example.  A  per-example  learner  might  be  more  appropriate  for  some  knowledge 
acquisition  tasks  than  the  current  per-lesson  version. 

It  bears  reiterating  that  the  dual-parser  technique  that  performs  skeleton  induction  is 
simple  and  efficient  because  the  procedure  representation  language  is  applicative,  if 
the  language  allowed  side-effects,  such  as  storage  of  information  in  global  buffers  or 
variables,  then  parsing  would  be  much  more  difficult 

As  mentioned  earlier  Sierra  consists  of  three  induction  algorithms,  for  respectively, 
skeletons,  patterns  and  functions  This  three-way  decomposition  may  apply  to  other 
learning  tasks  than  orocedure  learning  The  application  of  the  induction  algorithms  is 
limited  only  by  the  topology  of  the  knowledge  representations  and  not.  of  course,  by 
what  those  knowledge  representations  denote  I  would  expect  these  algorithms  to 
apply,  for  instance  to  other  learning-py-expianation  tasks  Explanations  of  stories  often 
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feature  hierarchical  structures  similar  to  the  calling  structures  of  procedures  Such 
explanations  are  generated  oy  instantiating  and  composing  scnemata  These  schemata 
are  analogous  to  subprocedures  Schemata  otten  contain  restrictions  on  slots  tnat  are 
equivalent  to  patterns  'f  these  equivalences  hold  then  learning  new  schemata  could 
be  accomplished  by  the  same  techniques  that  are  usea  here  to  learn  subprocedures 
'o  take  a  second  example  eiectronic  circuits  and  other  engineered  devices  are  often 
designed  to  nave  a  hierarcny  of  modules  This  hierarchy  corresponds  to  the  calling 
hierarchy  of  procedures  Patterns  and  functions  may  have  analogs  m  device  designs 
as  wen  if  so  then  the  techniques  presented  here  might  suffice  to  learn  how  to 
analyze  devices  in  short  the  three-way  decomposition  of  the  induction  problem  into 
■  nduction  of  hierarchies  induction  of  pattern-like  constructs  and  induction  compositions 
may  be  quite  generally  applicable. 

S.erra  s  skeleton  induction  is  a  form  of  context-free  grammar  induction,  since  the 
skeleton  of  a  whole  aog  is  precisely  a  context-free  grammar  Skeleton  induction  can 
be  used  to  induce  grammars  as  long  as  the  learners  example  sequence  conforms  to 
one-disjunct-per-example  On  the  other  hand,  if  one-disjunct-per-example  is  not  an 
appropriate  manner  constraint  for  some  domain,  then  some  other  grammar  induction 
algorithm  may  be  employed  to  perform  skeleton  induction,  while  the  other  two  induction 
algorithms  can  remain  relatively  unchanged  (for  reviews  of  grammar  induction, 
see  [15  9)) 

A  beneficial  consequence  of  one-disiunct-per-iesson  is  that  rule  patterns  are  pure 
conjunctions  This  means  that  a  non-heuristic,  complete  induction  algorithm,  based  on 
Mitchell  s  version  space  technique  [29|.  can  be  employed  to  induce  the  conditions 
However  it  turns  out  to  be  infeasible  to  use  |ust  the  version  space  technique.  For 
emoiricai  reasons  Sierra  mus>  use  non-toy  patterns  A  single  pattern  may  have  50 
variables  and  200  relations  For  patterns  of  such  sizes  induction  is  |ust  not  practical 
without  further  constraints  on  patterns  Fortunately  there  are  several  well-motivated 
constraints  available  m  this  domain  The  two  mentioned  at  the  end  of  section  6  seem 
ireiy  to  oe  useful  outside  the  oresent  domain 

The  mviSip.e  objects  problem  was  somewhat  of  a  surprise  it  took  a  long  time  to 
’■gure  out  mat  turchon  induction  nad  an  inherent  Drobiem  u.e  that  Sierra  wasn  t  a 
•  ictirr  nco^oiete  sets  ot  examplesi  it  toon  even  longer  to  become  convinced  that 


there  is  no  least-commitment,  incremental  algorithm  to  solve  it.  such  as  the  version 
space  algorithm,  if  an  inducer  cannot  ban  invisible  object  designators,  there  seem  to 
be  only  two  ways  to  get  around  the  invisible-objects  problem:  bacons 's  non-mcrementai 
induction  (see  section  7)  or  Sierra  s  show-work  manner  constraint.  Perhaps  more 
research  will  find  other  techniques.  The  problem  of  inducing  invisible  obiect 
designators  has  received  little  attention  from  machine  learning. 


47 


References 

1.  Amarel.  S.  Representations  and  modelling  m  problems  of  program  formation  in 

Machine  Intelligence  6.  B.  Meltzer  &  0.  Michie.  Eds.  Elsevier.  New  York.  1971 

2.  Anderson.  JR  "Acquisition  of  cognitive  skill"  Psychological  Review  89  (1982) 
369-406 

3.  Anzai  Y  &  Simon.  HA.  'The  theory  of  learninq  by  domq"  Psychological  Review 
86  (1979).  124-140. 

4.  Austin.  J.  L.  How  to  go  things  with  words  Oxford  University  Press.  New  York 
NY,  1962. 

5.  Badre.  N  A.  Computer  learning  from  English  text  University  of  California  at 
Berkeley.  Electronic  Research  Laboratory.  Berkeley,  CA.  1972  ERL-M372 

6.  Bauer.  M  A.  A  basis  for  the  acquisition  of  procedures  from  protocols. 

Proceedings  of  the  Fourth  IJCAI.  1975.  pp  226-231 

7.  Biermann.  A  W.  "On  the  inference  of  turing  machines  from  sample 
computations".  Artificial  intelligence  W  (1972).  181-198. 

8.  Biermann.  A.W.  "The  inference  of  regular  LISP  programs  from  examples"  IEEE 
Transactions  on  Systems.  Man.  and  Cybernetics  SMC-8.  8  (1978),  585-600 

9.  Biermann.  A.W.  &  Feldman.  J.A.  A  survey  of  results  in  grammatical  inference.  In 
S.  Watanabe.  Ed.,  Frontiers  of  pattern  recognition.  Academic.  New  York.  1972. 

10.  DeJong,  G.  Generalizations  based  on  explanations.  Proceedings  of  IJCAI  1981. 
Los  Altos.  CA.  1981. 

11.  OeJong,  G.  A  brief  overview  of  explanatory  schema  acquisition  Proceedings  of 
the  third  Machine  Learning  Workshop,  1985.  To  appear  in:  T.M  Mitchell  J  G. 

Carboneil.  &  R.S.  Michalski  (eds.)  Machine  Learning:  A  Guide  to  Current  Research. 
Kluwer 

12.  Dietterich.  T  G..  Michalski,  R.  C.  A  comparative  review  of  selected  methods  for 
learning  from  examples.  In  Machine  Learning  An  Artificial  intelligence  Aporoacn. 

R  C  Michalski.  V.  G.  Carboneil.  and  T  M.  Mitchell.  Eds  Tioga  Press  Palo  Alto  CA 

13.  Dilley.  C.  A..  Rucker,  W  E.  &  Jackson.  A.  E  Heath  Elementary  Mathematics 

Heath.  Lexington.  MA.  1975 

14.  Ellman.  T.  Explanation-based  learning  in  logic  circuit  design  Proceedmqs  of  me 
Third  Machine  Learning  Workshop  1985.  To  appear  m  T  M  Mitchell  j  G  Carooneii 

&  R  S.  Michalski  (eds.)  Machine  Learning:  A  Guide  to  Current  Researcn  *,i.jAer 

15.  Fu  K.  &  Booth.  T.  "Grammatical  inference  introduction  ana  surve.  EEE 
Transactions  on  System  Man  ana  Cybernetics  5  (1975)  95- 1 1 1 


16.  Genesereth  M  R  The  role  of  olans  m  intelligent  teaching  systems 
Tutoring  Systems.  Academic.  New  York.  1982 


e/i  ;f-nr 


17.  Gold.  E  M  'Language  identification  in  the  limit",  information  ana  Control  W 
(1967)  447-474 

18.  Hedrick.  Cl  "Learning  production  systems  from  examples"  Artificial  Intelligence 
7  (1976).  21-49. 

19.  Hempei.  C.G  "Studies  in  the  logic  of  confirmation"  Mind  54  (1945).  1-26. 
97-121. 

20.  Jonnson.  L.  &  Soloway.  E  intention-based  diagnosis  of  programming  errors. 
Proceedings  of  AAAI-84  1984,  pp.  162-168. 

21.  Kaplan  R  M.  A  general  syntactic  processor,  in  Natural  Language  Processing. 

R  Rustm.  Ed..  Algorithmics  Press.  New  York.  1973. 

22.  Knuth  D  E.  "Semantics  of  context-free  languages"  Mathematical  Systems  Theory 
2  (1968).  127-145 

23.  Koster  C  H  A  Affix  grammars.  In  J.E.  Peck,  Ed..  ALGOL  68  Implementation. 
North-Holland  Amsterdam,  1971. 

24.  Langley,  P  Rediscovering  psysics  with  Bacon  3.  Proceedings  of  the  Sixth 
IJCAl  IJCAI.  Kaufman.  Los  Altos,  CA,  1979. 

25.  Langley.  P  Ohlsson.  S.  &  Sage.  S.  A  machine  learning  approach  to  student 
modeling  CMU-RI-TR-84-7.  Carnegie-Mello n  University.  Pittsburgh.  PA.  1984. 

26.  McDermott.  J.  &  Forgy.  C  L.  Production  system  conflict  resolution  strategies  In 
Pattern-directed  inference  systems.  Academic.  New  York.  1978 

27.  Miller.  M  L.  &  Goldstein.  I  P  Overview  of  a  linguistic  theory  of  design.  383A. 

M  i  T  Artificial  intelligence  Laboratory.  Cambridge,  MA.  1977 

28.  Mitchell  T  M  The  need  for  biases  in  learning  generalizations  CBM-TR-117. 
Rutgers  University  Computer  Science  Department  Rutgers,  NJ.  1980 

29.  Mitchell.  T.M  "Generalization  as  search".  Artificial  Intelligence  18  (1982), 

203-226 

30.  Mitchell  T  M  Mahadevan.  S.  &  Steinberg,  L  A  learning  apprentice  system  for 
VLSI  design  Proceedings  of  the  Third  Machine  Learning  Workshop,  1985  To  appear 
m  T  M  Mitchell.  J  G  Carboneil.  &  R  S  Michalski  (eds  )  Machine  Learning  A  Guide  to 
Current  Research  Kluwer 

31.  Mitchell.  T  M  Utgoff  P  E  &  Baner|i  R,  B  Learning  problem-solving  heuristics 

by  experimentation  in  Machine  Learning.  R.  S  Michalski.  T.  M.  Mitchell  & 

J  Carboneil  Eds  Tioga  Press.  Palo  Alto,  CA,  1983 

32.  Mooney  R  Generalizing  explanations  of  narratives  into  schemata  Proceedings 
of  the  Third  Machine  Learning  Workshop.  1985  To  appear  in.  T  M  Mitchell,  J  G 
Carboneil  &  R  S  Michalski  (eds  )  Machine  Learning:  A  Guide  to  Current  Research 
Kluwer 


33.  Neves.  D  M  Learning  procedures  from  examples  Ph  D  Th  .  Department  of 
Psychology.  Carnegie-Meilon  University.  PittsPurgh  PA.  1981 

34.  Newell.  A.  "The  Knowledge  Level"  Artificial  Intelligence  18  (1982),  87-127 

35.  Osherson  DN  Stop.  M.  &  Weinstein.  S  "Ideal  learning  machines"  Cognitive 
Science  6  (1982)  277-290 

36.  Rich.  C  &  Shrobe.  H.  Initial  report  on  a  lisp  programmers  apprentice.  Al- 
TR-354,  MIT.  Al  Lab  Cambridge  MA,  1976 

37.  Searle  J  Speech  Acts  An  essay  m  the  philosophy  of  language  Cambridge 
University  Press.  Cambridge.  GB.  1969 

38.  Segre  A  M.  Explanation-based  manipulator  learning.  Proceedings  of  the  Third 
Machine  Learning  Workshop.  1985.  To  appear  in:  T.M  Mitchell.  J  G.  Carbonell.  &  R  S 
Michalski  (eds  )  Machine  Learning:  A  Guide  to  Current  Research,  Kluwer, 

39.  Shavlik.  J.  Learning  classical  physics  Proceedings  of  the  Third  Machine 
Learning  Workshop  1985.  To  appear  in:  T.M.  Mitchell.  J.G.  Carboneu.  &  R.S. 

Michalski  (eds  )  Machine  Learning;  A  Guide  to  Current  Research.  Kluwer. 

40.  Shaw.  D  E.,  Swartout.  W.R.  &  Green.  C.C.  Inferring  lisp  programs  from 
examples.  Proceedings  of  the  fourth  IJCAI,  Los  Altos.  CA,  1975. 

41.  Siklossy.  L.  &  Sykes.  D  A.  Automatic  program  synthesis  from  example  problems. 
Proceedings  of  IJCAI-4.  Los  Altos,  CA.  1975. 

42.  Simon.  H  A.  Why  should  machines  learn?  In  Machine  Learning  An  Artificial 
intelligence  Approach.  Tioga.  Palo  Alto.  CA.  1983 

43.  Smith,  D  E  Focuser:  A  strategic  interaction  paradigm  for  language  acquistion. 
Tech  Report  LCSR-TR-36.  Laboratory  for  Computer  Science  Research.  Rutgers 
University.  1982.  Rutgers.  NJ. 

44.  VanLehn.  K  Felicity  conditions  for  human  skill  acquisition:  Validating  an  Al- 
based  theory  Tech  Report  ClS-21.  Xerox  Palo  Alto  Research  Center.  1983. 

45.  VanLehn  K  Human  procedural  skill  acquisition:  Theory,  model  and  psychological 
validation  Proceedings  of  AAAI-83.  Los  Altos.  CA.  1983 

46.  Watt.  D  A  "The  parsing  problem  for  affix  grammars"  Acta  Intormarica  8  (1977). 
t-20 

47.  Winston.  PH  Learning  structural  descriptions  from  examples.  Al  TR-231.  Ml  T 
Al  Laboratory.  Cambridge.  MA.  1970 

48.  Wmston  P  h  Learning  structural  descriptions  from  examples.  In  The 

3 sycnoiogy  ot  Computer  Vision  P  H  Winston  Ed  McGraw-Hill  New  York.  1 97 5 

49.  Wmston  P  H  Learning  by  creating  transfer  frames'  Amticial  intelligence  W 

"9781  147-172 

50  Wmston  P  h  Learning  new  principles  from  precedents  and  exercises  AIM  632 
MIT  Al  Laboratory  Cambridge  MA  1981 


50 


51.  Woods.  W  A  Kaplan.  R  .  Nash-Webber.  B.  The  lunar  sciences  natural  language 
information  system  BBN  Rept.  2378.  Cambridge.  MA:  Bolt  Beranek.  &  Newman, 

1972 


Personnel  Analysis  Division, 
AF/MPXA 

5C360,  The  Pentagon 
Washington,  DC  20330 

Air  Force  Human  Resources  Lab 
AFHRL/MPD 

Brooks  AFB,  TX  78235 
AFOSR, 

Life  Sciences  Directorate 
Bolling  Air  Force  Base 
Washington,  DC  20332 

Dr.  Robert  Ahlers 
Code  N7 1 1 

Human  Factors  Laboratory 
NAVTRAEQUIPCEN 
Orlando,  FL  32813 

Dr.  Ed  Aiken 

Navy  Personnel  R&D  Center 
San  Diego.  CA  92152 

Dr.  Earl  A.  Alluisi 
HQ.  AFHRL  (AFSC) 

Brooks  AFB,  TX  73235 

Dr.  John  R.  Anderson 
Department  of  Psychology 
Carnegie-Mellon  University 
Pittsburgh,  PA  15213 

Dr.  Steve  Andriole 
Perceptronics ,  Inc. 

21111  Erwin  Street 
Woodland  Hills,  CA  91367-3713 

Technical  Director,  ARI 

5001  Eisenhower  Avenue 
Alexandria,  VA  22333 

Dr.  Patricia  Baggett 
University  of  Colorado 
Department  of  Psychology 
Box  345 

Boulder,  CO  80309 

Dr.  Meryl  S.  Baker 
Navy  Personnel  R&D  Center 
San  Diego,  CA  92152 


Dr.  Cautam  Biswas 
Department  of  Computer  Science 
University  of  South  Carolina 
Columbia,  SC  292C8 

Dr.  John  Black 
Yale  University 
Box  11A,  Yale  Station 
New  Haven ,  CT  06520 

Arthur  S.  Blaiwes 
Code  N711 

Naval  Training  Equipment  Center 
Orlando,  FL  32813 

Dr.  Jeff  Bonar 
Learning  R&D  Center 
University  of  Pittsburgh 
Pittsburgh,  PA  15260 

Dr.  Richard  Braby 
NTEC  Code  10 
Orlando,  FL  32751 

Dr.  Robert  Breaux 
Code  N-095R 

nav?raequ:?:en 
Orlando.  FL  32813 

Dr.  Ann  Brcwn 

Center  for  the  Study  of  Reading 
University  of  Illinois 
51  Gerty  Drive 
Champaign,  IL  61280 

Dr.  John  S.  Brown 
XEROX  Palo  Alto  Research 
Center 

3333  Coyote  Road 
Palo  Alto,  CA  94304 

Dr.  Bruce  E.'hanan 
Computer  Sc-ence  Department 
Stanford  University 
Stanford,  CA  94305 

Dr.  Patnci:  A.  Butler 
HIE  Mail  Step  1806 
1200  19th  St . ,  NW 
Washington ,  DC  2C208 


T-c  TV  ^  m,  *j.  *r  ■>  ■y'V  ';  7  . V.f .* 


k 

I 


1 


w 

u* 

c 


Dr.  Robert  Calfee 
School  of  Education 
Stanford  University 
Stanford,  CA  94305 


Dr.  Jaime  Carbonell 
Carnegie-Mellon  University 
Department  of  Psychology 
Pittsburgh,  PA  15213 


Dr.  Susan  Carey 
Harvard  Graduate  School  of 
Education 

337  Gutman  Library 
Appian  Way 
Cambridge.  MA  0138 


Dr.  Pat  Carpenter 
Carnegie-Mellon  University 
Department  of  Psychology 
Pittsburgh,  PA  15213 


Dr.  Robert  Carroll 
NAVOP  01B7 

Washington,  DC  20370 


Dr.  Fred  Chang 

Navy  Personnel  RAD  Center 

Code  51 

San  Diego,  CA  92152 


Dr.  Davida  Charney 
Department  of  Psychology 
Carnegie-Mellon  University 
Sohenley  Park 
Pittsburgh,  PA  15213 


Dr.  Eugene  Charniak 
Brown  University 
Computer  Science  Department 
Providence,  RI  02912 


Dr.  Michelene  Chi 
Learning  RAD  Center 
University  of  Pittsburgh 
3939  O’Hara  Street 
Pittsburgh,  PA  15213 


Dr.  Susan  Chipman 
Code  442PT 

Office  of  Naval  Research 
600  N.  Quincy  St. 
Arlington.  VA  22217-5000 


Mr.  Raymond  E.  Christal 
AFHRL/MOE 

Brooks  AFB,  TX  78235 


Dr.  Yee-Yeen  Chu 
Perceptronics ,  Inc. 

21111  Erwin  Street 
Woodland  Hills,  CA  91367-3713 


v.v 


Dr.  William  Clancey 
Computer  Science  Department 
Stanford  University 
Stanford,  CA  94306 


>-« 


Scientific  Advisor 
to  the  DC NO  (MPT) 

Center  for  Naval  Analysis 
2000  North  Beauregard  Street 
Alexandria,  VA  22311 


Chief  of  Naval  Education 
and  Training 
Liaison  Office 

Air  Force  Human  Resource  Laboratory 
Operations  Training  Division 
Williams  AFB.  AZ  85224 


Assistant  C-.ief  of  Staff 

for  Research,  Development, 


Test,  an: 

Evaluation 

Naval  Educd 

ion  and 

v-v-l 

Training 

Command  (N-5) 

NAS  Pensacola.  FL  32508 

Dr.  Allan  M. 

Collins 

Bolt  BeraneA 

A  Newman,  Inc. 

50  Moulton  S 

treet 

*  ^ 

Cambridge,  *•' 

A  02138 

Dr.  Stanley 

Collyer 

PSI 

Office  of  Na 

val  Technology 

800  N.  Quine 

y  Street 

•  - 

Arlington, 

A  22217 

*  V  '**• 

CTB/McGraw-Liill  Library 
2500  Garden  Road 
Monterey,  CA  93940 


CDR  Mike  Curran 
Office  of  Naval  Research 
800  N .  Quincy  St . 

Code  270 

Arlington,  VA  22217-5000 


'*3 


A  1*.  l*.  1 « A  A»  Sa  V  'T A.V  iJ*.  ^  4.VO.L.*.  O.  t-*  *-*  j  *  1  J 


Bryan  Dallman 
AFHRL/LRT 

Lowry  AFB,  CO  80230 

Dr.  Charles  E.  Davis 
Personnel  and  Training  Research 
Office  of  Naval  Research 
Code  442PT 

800  North  Quincy  Street 
Arlington,  VA  22217-5000 

Defense  Technical 

intormation  Center 
Cameron  Station,  Bldg  5 
Alexandria,  VA  22314 
Attn:  TC 
(12  Copies) 

Dr.  Thomas  M.  Duffy 
Communications  Design  Center 
Carnegie-Mellcn  University 
Schenley  Park 
Pittsburgh,  PA  15213 

Edward  E.  Eddowes 
CNATRA  N301 
Naval  Air  Station 
Corpus  Christi,  TX  78419 

Dr.  John  Ellis 

Navy  Personnel  R&D  Center 

San  Diego,  CA  92252 

Dr.  Richard  Elster 
Deputy  Assistant  Secretary 
of  the  Navy  (Manpower) 
Washington,  DC  2C350 

Dr.  Susan  Embretson 
University  of  Kansas 
Psychology  Department 
Lawrence,  KS  66045 

Dr.  Randy  Engle 
Department  of  Psychology 
University  of  South  Carolina 
Columbia,  SC  29208 

Dr.  William  Epstein 
University  of  Wisconsin 
W.  J.  Brogden  Psychology  Bldg. 
1202  >/.  Johnson  Street 
Madison,  WI  53706 


ERIC  Facility-Acquisitions 
4833  Rugby  Avenue  ' 

Bethesda,  MD  2C014 

Dr.  K.  Anders  Ericsson 
University  of  Colorado 
Department  of  Psychology 
Boulder.  CO  80309 

Edward  Esty 

Department  of  Education,  OERI 
MS  40 

1200  19th  St. ,  NW 
Washington,  DC  20208 

Dr.  Beatrice  J.  Farr 
Army  Research  Institute 
5001  Eisenhower  Avenue 
Alexandria,  VA  22333 

Dr.  Marshall  J.  Farr 
2520  North  Vernon  Street 
Arlington,  VA  22207 

Dr.  Pat  Federico 
Code  511 
NPRDC 

San  Diego,  CA  92152 

Dr.  Jerome  A.  Feldman 
University  : f  Rochester 
Computer  Science  Department 
Rochester,  NY  14627 

Dr.  Paul  Feltovich 
Southern  Il.inois  University 
Scnool  of  Mc.icine 
Medical  Edu:-tion  Department 
P.G.  Box  3926 
Springfield,  IL  62708 

Mr.  Wallace  Feurzeig 
Educational  Tecnnology 
Bolt  Beranek.  &  Newman 
10  Moulton  St. 

Cambridge,  MA  C2238 

Dr  .  Craig  I .  Fields 
ARPA 

1  «00  W i 1  son  El vd . 

Arlington,  VA  22209 


Dr.  Linda  Flower 
Carnegie-Mellon  University 
Department  of  English 
Pittsburgh,  PA  15213 

Dr.  Ken  Forbus 

Department  of  Computer  Science 
University  of  Illinois 
Champaign,  IL  61820 

Dr.  Carl  H.  Frederiksen 
McGill  University 
3700  McTavisn  Street 
Montreal,  Quebec  H3A  1Y2 
CANADA 

Dr.  John  R.  Frederiksen 
Bolt  Beranek  &  Newman 
50  Moulton  Street 
Cambridge,  MA  02138 

Dr.  Norman  Frederiksen 
Educational  Testing  Service 
Princeton,  NJ  08541 

Dr.  R.  Edward  Geiselman 
Department  of  Psychology 
University  of  California 
Los  Angeles,  CA  90024 

Dr.  Michael  Genesereth 
Stanford  University 
Computer  Science  Department 
Stanford,  CA  94305 

Dr.  Dedre  Gentner 
University  of  Illinois 
Department  of  Psychology 
603  E.  Daniel  St. 

Champaign,  IL  61820 

Dr.  Don  Gentner 
Center  for  Human 

Information  Processing 
University  of  California 
La  Jolla,  CA  92093 

Dr.  Robert  Glaser 
Learning  Research 

h  Development  Center 
University  of  Pittsburgh 
3930  O'Hara  Street 
Pittsburgh,  PA  15260 


Dr.  Arthur  M.  Glenberg 
University  of  Wisconsin 
W.  J.  Brogden  Psychology  Bldg. 
1202  W.  Johnson  Street 
Madison,  WI  53706 

Dr.  Marvin  D.  Glock 
13  Stone  Hall 
Cornell  University 
Ithaca,  NY  14853 

Dr.  Gene  L.  Gloye 
Office  of  Naval  Research 
Detachment 
1030  E.  Green  Street 
Pasadena,  CA  91106-2485 

Dr.  Sam  Glucksberg 
Princeton  University 
Department  of  Psychology 

Green  Hall 

Princeton,  NJ  08540 

Dr.  Joseph  Goguen 
Computer  Science  Laboratory 
SRI  International 
333  Ravenswood  Avenue 
Menlo  Park,  CA  94025 

Dr.  Sherrie  Gott 
AFHEL/MODJ 

Brooks  AFB,  TX  78235 

Dr.  Richard  H.  Granger 
Department  of  Computer  Science 
University  cf  California,  Irvine 
Irvine,  CA  -2717 

Dr.  Wayne  G^ay 
Army  Researcr.  Institute 
5001  Eisenhower  Avenue 
Alexandria,  7A  22333 

Dr.  James  G.  Greeno 
University  cf  California 
Berkeley,  CA  94720 

Dr.  Henry  M.  Halff 
Haiff  Resources,  Inc. 

4916  33>"d  Road,  North 
Arlington,  VA  22207 


Dr.  David  R.  Lambert 
Naval  Ocean  Systems  Center 
Code  H41 T 

271  Catalina  Boulevard 
San  Diego,  CA  9215 2 

Dr.  Pat  Langley 
University  of  California 
Department  of  Information 
and  Computer  Science 
Irvine,  CA  92717 

M.  Diane  Langston 
Communications  Design  Center 
Carnegie-Mellon  University 
Schenley  Park 
Pittsburgh,  PA  15213 

Dr.  Kathleen  LaPiana 
Naval  Health  Sciences 

Education  and  Training  Command 
Naval  Medical  Command, 

National  Capital  Region 
Bethesda,  MD  20814-5022 

Dr.  Jill  Larkin 
Carnegie-Mellon  University 
Department  of  Psychology 
Pittsburgh,  PA  15213 

Dr.  Robert  Lawler 
Information  Sciences,  FRL 
GTE  Laboratories,  Inc. 

40  Sylvan  Road 
Waltham,  MA  02254 

Dr.  Paul  E.  Lehner 
PAR  Technology  Corp. 

7926  Jones  Branch  Drive 
Suite  170 
McLean,  VA  22102 

Dr.  Alan  M.  Lesgold 
Learning  R&D  Center 
University  of  Pittsburgh 
Pittsburgh,  PA  15260 

Dr.  Jim  Levin 
University  of  California 
Laboratory  for  Comparative 
Human  Cognition 
D003A 

La  Jolla,  CA  92093 


Dr.  Clayton  Lewis 
University  of  Colorado 
Department  of  Computer  Science 
Campus  Box  430 
Boulder,  CO  80309 

Science  and  Technology  Division 
Library  of  Congress 
Washington,  DC  20540 

Dr.  Charlotte  Linde 
SRI  International 
333  Ravenswood  Avenue 
Menlo  Park,  CA  94025 

Dr.  Marcia  C.  Linn 
Lawrence  Hall  of  Science 
University  of  California 
Berkeley,  CA  94720 

Dr .  Don  Lyon 
P.  0.  Box  44 
Higiey ,  AZ  85236 

Dr.  Jane  Malin 
Mail  Code  SR  111 
NASA  Johnson  Space  Center 
Houston,  TX  77058 

Dr.  William  L.  Maloy  (02) 

Chief  of  Naval  Education 
and  Training 
Naval  Air  Station 
Pensacola,  FL  32508 

Dr.  Sandra  P.  Marshall 
Department  of  Psychology 
University  of  California 
Santa  Barbara,  CA  93106 

Dr.  Manton  M.  Matthews 
Department  c:'  Computer  Science 
University  of  South  Carolina 
Columbia,  SC  2S208 

Dr.  Richard  E.  Mayer 
Department  of  Psychology 
University  of  California 
Santa  Barbara,  CA  93106 


Dr.  James  McBride 
Psychological  Corporation 
c/o  Harcourt,  Brace, 
Javanovich  Inc. 

1250  West  6th  Street 
San  Diego,  CA  92101 

Dr.  James  McMichael 
Navy  Personnel  RAD  Center 
San  Diego,  CA  92152 

Dr.  Barbara  Means 
Human  Resources 

Research  Organization 
1100  South  Washington 
Alexandria,  VA  22314 

Dr.  Arthur  Melmed 

U.  S.  Department  of  Education 

724  Brown 

Washington,  DC  20208 

Dr.  A1  Meyrowitz 
Office  of  Naval  Research 
Code  433 
800  N.  Quincy 
Arlington,  VA  22 217-5000 

Dr.  George  A.  Miller 
Department  of  Psychology 
Green  Hall 

Princeton  University 
Princeton,  NJ  08540 

Dr.  Lance  A.  Miller 
IBM  Thomas  J.  Watson 
Research  Center 
P.0.  Box  218 

Yorktown  Heights,  NY  10598 

Dr.  Andrew  R.  Molnar 
Scientific  and  Engineering 
Personnel  and  Education 
National  Science  Foundation 
Washington.  DC  20550 

Dr.  William  Montague 

NPRDC  Code  13 

San  Diego,  CA  92152 


Dr.  Allen  Munro 
Behavioral  Technology 
Laboratories  -  USC 
1845  S.  Elena  Ave.,  4th  Floor 
Redondo  Beach,  CA  90277 

Spec.  Asst,  for  Research,  Experi¬ 
mental  A  Academic  Programs, 
NTTC  (Code  016) 

NAS  Memphis  (75) 

Millington,  TN  38054 

Dr.  Richard  E.  Nisbett 
University  of  Michigan 
Institute  for  Social  Research 
Room  5261 

Ann  Arbor,  MI  48109 

Dr.  Donald  A.  Norman 
Institute  for  Cognitive  Science 
University  of  California 
La  Jolla,  CA  92093 

Director,  Training  Laboratory, 
NPRDC  (Code  05) 

San  Diego,  CA  92152 

Director,  Manpower  and  Personnel 
Laboratory, 

NPRDC  (Code  06) 

San  Diego,  CA  92152 

Director,  Human  Factors 

A  Organizational  Systems  Lab, 
NPRDC  (Code  07) 

San  Diego,  CA  92152 

Fleet  Support  Office, 

NPRDC  (Cede  301) 

San  Diego,  CA  92152 

Library,  NPRDC 

Code  P201L 

San  Diego,  CA  92152 

Commanding  Officer, 

Naval  Research  Laboratory 
Code  2627 

Washington,  DC  20390 


Or.  Ronald  K.  Hambleton 
Laboratory  of  Psychometric  and 
Evaluative  Research 
University  of  Massachusetts 
Amherst,  MA  01003 

Dr.  Cheryl  Hamel 
NTEC 

Orlando,  FL  32813 
Stevan  Harnad 

Editor,  The  Behavioral  and 
Brain  Sciences 
20  Nassau  Street,  Suite  240 
Princeton,  NJ  08540 

Mr.  William  Hartung 
PEAM  Product  Manager 
Army  Research  Institute 
5001  Eisenhower  Avenue 
Alexandria,  VA  22333 

Dr.  Wayne  Harvey 
SRI  International 
333  Ravenswood  Ave. 

Room  B-S324 

Menlo  Park,  CA  94025 

Prof,  John  R.  Hayes 
Carnegie-Mellon  University 
Department  of  Psychology 
Schenley  Park 
Pittsburgh,  PA  15213 

Dr.  Barbara  Hayes-Roth 
Department  of  Computer  Science 
Stanford  University 
Stanford,  CA  95305 

Dr.  Frederick  Hayes-Roth 

Teknowledge 

525  University  Ave. 

Palo  Alto,  CA  94301 

Dr.  Joan  I.  Heller 
Graduate  Group  in  Science  and 
Mathematics  Education 
c/o  School  of  Education 
University  of  California 
Berkeley,  CA  94720 


Dr.  Geoffrey  Hinton 
Computer  Science  Department 
Carnegie-Mellon  University 
Pittsburgh,  PA  15213 

Dr.  Jim  Hollan 
Code  51 

Navy  Personnel  R  &  D  Center 
San  Diego,  CA  92152 

Dr.  John  Holland 
University  of  Michigan 
2313  East  Engineering 
Ann  Arbor,  MI  48109 

Dr.  Melissa  Holland 
Army  Research  Institute  for  the 
Behavioral  and  Social  Sciences 
5001  Eisenhower  Avenue 
Alexandria,  VA  22333 

Dr.  Keith  Holyoak 
University  of  Michigan 
Human  Performance  Center 
330  Packard  Road 
Ann  Arbor,  MI  48109 

Dr.  Ed  Hutchins 

Navy  Personnel  R&D  Center 

San  Diego,  CA  92152 

Dr.  Dillon  Inouye 
WICAT  Education  Institute 
Provo,  UT  8^057 

Dr.  S.  Iyengar 
Stanford  University 
Department  of  Psychology 
Bldg.  4201  —  Jordan  Hall 
Stanford,  CA  94305 

Dr.  Zachary  Jacobson 

Bureau  of  Management  Consulting 

305  Laurier  Avenue  West 

Ottawa,  Ontario  K1A  0S5 

CANADA 

Dr.  Robert  Jannarone 
Department  cf  Psychology 
University  of  South  Carolina 
Columbia,  SC  29208 


Dr.  Claude  Janvier 

Directeur,  CIRADE 

Universite'  du  Quebec  a  Montreal 

Montreal,  Quebec  H3C  3P 8 

CANADA 

Margaret  Jerome 

c/o  Dr.  Peter  Chandler 

83,  The  Drive 

Hove 

Sussex 

UNITED  KINGDOM 

Dr.  Joseph  E.  Johnson 
Assistant  Dean  for 
Graduate  Studies 

College  of  Science  and  Mathematics 
University  of  South  Carolina 
Columbia,  SC  29208 

Dr.  Douglas  H.  Jones 
Advanced  Statistical 

Technologies  Corporation 
10  Trafalgar  Court 
Lawrenceville,  NJ  08148 

Dr.  Marcel  Just 
Ca rn eg i e-Mel Ion  University 
Department  of  Psychology 
Schenley  Park 
Pittsburgh,  PA  15213 

Dr.  Milton  S.  Katz 
Army  Research  Institute 
5001  Eisenhower  Avenue 
Alexandria,  VA  22333 

Dr.  Scott  Kelso 
Haskins  Laboratories. 

270  Crown  Street 
New  Haven,  CT  06510 

Dr.  Norman  J.  Kerr 
Chief  of  Naval  Education 
and  Training 
Code  00A2 
Naval  Air  Station 
Pensacola,  FL  32508 


Dr.  Dennis  Kibler 
University  of  California 
Department  of  Information 
and  Computer  Science 
Irvine.  CA  92717 

Dr.  David  Kieras 
University  of  Michigan 
Technical  Communication 
College  of  Engineering 
1223  E.  Engineering  Building 
Ann  Arbor,  MI  48109 

Dr.  Peter  Kincaid 
Training  Analysis 

A  Evaluation  Group 
Department  of  the  Navy 
Orlando,  FL  32813 

Dr.  David  Klahr 
Carnegie-Mellon  University 
Department  of  Psychology 
Schenley  Park 
Pittsburgh,  PA  15213 

Dr.  Mazie  Knerr 
Program  Mar. :-ger 
Training  F.--:earch  Division 
HumRRO 

1100  S.  Washington 
Alexandria,  VA  22314 

Dr.  Janet  L.  Kolodner 
Georgia  Institute  of  Technology 
School  of  Information 
&  Computer  Science 
Atlanta,  GA  30332 

Dr.  Kenneth  Kotov  sky 
Department  :f  Psychology 
Community  Cillege  of 
Allegheny  County 
800  Allegheny  Avenue 
Pittsburgh,  PA  15233 

Dr.  Benjamin.  Kuipers 
MIT  Laboratc-y  for  Computer  Science 
545  Technol:gy  Square 
Cambridge,  f-'A  02139 

Dr.  Patrick  Kyllonen 
AFHRL/MOE 

Brooks  AFB,  IX  78235 


Dr.  Harry  F.  O'Neil,  Jr. 
Training  Research  Lab 
Army  Research  Institute 
5001  Eisenhower  Avenue 
Alexandria,  VA  22 333 

Dr.  Stellan  Ohlsson 
Learning  RAD  Center 
University  of  Pittsburgh 
3939  O'Hara  Street 
Pittsburgh.  PA  15213 

Director,  Technology  Programs, 
Office  of  Naval  Research 
Code  200 

800  North  Quincy  Street 
Arlington.  VA  22217-5000 

Director,  Research  Programs. 

Office  of  Naval  Research 
800  North  Quincy  Street 
Arlington,  VA  22217-5000 

Mathematics  Group, 

Office  of  Naval  Research 
Code  41  IMA 

800  North  Quincy  Street 
Arlington,  VA  22217-5000 

Office  of  Naval  Research, 

Code  433 

800  N.  Quincy  Street 
Arlington,  VA  22217-5000 

Office  of  Naval  Research, 

Code  442 

800  N.  Quincy  St. 

Arlington,  VA  22217-5000 

Office  of  Naval  Research, 

Code  442EP 

800  N.  Quincy  Street 
Arlington,  VA  22217-5000 

Office  of  Naval  Research, 

Code  442PT 

800  N.  Quincy  Street 
Arlington,  VA  22217-5000 
(6  Copies) 


Special  Assistant  for  Marine 
Corps  Matters, 

0NR  Code  100M 
800  N.  Quincy  St. 

Arlington,  VA  22217-5000 

Psychologist 
0NR  Branch  Office 
1030  East  Green  Street 
Pasadena,  CA  91101 

Dr.  Judith  Orasanu 
Army  Research  Institute 
5001  Eisenhower  Avenue 
Alexandria.  VA  22333 

Dr.  Jesse  Orlansky 
Institute  for  Defense  Analyses 
1801  N.  Beauregard  St. 

Alexandria,  VA  22311 

Prof.  Seymour  Papert 
20C-109 

Massachusetts  Institute 
of  Technology 
Cambridge,  MA  02139 

Lt .  Col.  (Dr.)  David  Payne 
AFHRL 

Brooks  AFB,  TX  78235 

Dr.  Douglas  Pearse 

DCIEM 

Box  2000 

Downsview,  Ontario 
CANADA 

Dr.  Nancy  Pennington 
University  of  Chicago 
Graduate  School  of  Business 
1101  E.  58th  St. 

Chicago,  IL  6O637 

Military  Assistant  for  Training  and 
Personnel  Technology, 

0USD  (R  &  E) 

Room  3D  129.  The  Pentagon 
Washington,  DC  20301 


Dr.  David  N.  Perkins 
Educational  Technology  Center 
337  Gutman  Library 
Appian  Way 
Cambridge,  MA  02138 

Administrative  Sciences  Department, 
Naval  Postgraduate  School 
Monterey,  CA  93940 

Department  of  Operations  Research, 
Naval  Postgraduate  School 
Monterey,  CA  93940 

Department  of  Computer  Science, 
Naval  Postgraduate  School 
Monterey,  CA  93940 

Dr.  Tjeerd  Plomp 

Twente  University  of  Technology 

Department  of  Education 

P.0.  Box  217 

7500  AE  ENSCHEDE 

THE  NETHERLANDS 

Dr.  Martha  Poison 
Department  of  Psychology 
Campus  Box  346 
University  of  Colorado 
Boulder.  CO  80309 

Dr.  Peter  Poison 
University  of  Colorado 
Department  of  Psychology 
Boulder,  CO  80309 

Dr.  Steven  E.  Poltrock 
MCC 

9430  Research  Blvd. 

Echelon  Bldg  #1 
Austin,  TX  78759-6509 

Dr.  Harry  E.  Pople 
University  of  Pittsburgh 
Decision  Systems  Laboratory 
1360  Scaife  Hall 
Pittsburgh,  PA  15261 

Dr.  Joseph  Psotka 
ATTN:  PER  1-1 C 
Army  Research  Institute 
5001  Eisenhower  Ave. 

Alexandria,  VA  22 333 


Dr.  Lynne  Reder 
Department  of  Psychology 
Carnegie-Mellon  University 
Schenley  Park 
Pittsburgh,  PA  15213 

Dr.  James  A.  Reggia 
University  of  Maryland 
School  of  Medicine 
Department  of  Neurology 
22  South  Greene  Street 
Baltimore,  MD  21201 

Dr.  Fred  Reif 
Physics  Department 
University  of  California 
Berkeley,  CA  94720 

Dr.  Lauren  Resnick 
Learning  RAD  Center 
University  of  Pittsburgh 
3939  O'Hara  Street 
Pittsburgh,  PA  15213 

Dr.  Mary  S.  Riley 
Program  in  Cognitive  Science 
Center  for  Human  Information 
Processing 

University  of  California 
La  Jolla,  CA  92093 

Dr.  Andrew  M.  Rose 
American  Institutes 
for  Research 

1055  Thomas  Jefferson  St.,  NW 
Washington.  DC  20007 

Dr.  William  B.  Rouse 
Georgia  Institute  of  Technology 
School  of  Industrial  A  Systems 
Engineering 
Atlanta,  GA  30332 

Dr.  Donald  Rubin 
Statistics  Department 
Science  Center,  Room  608 
1  Oxford  Street 
Harvard  University 
Cambridge,  MA  02138 

Dr.  Lawrence  Rudner 
403  Elm  Avenue 
Takoma  Park,  MD  20012 


Dr.  Michael  J.  Samet 
Perceptronics,  Inc 
6271  Variel  Avenue 
Woodland  Hills,  CA  91364 

Dr.  Robert  Sasmor 
Army  Research  Institute 
5001  Eisenhower  Avenue 
Alexandria,  VA  22333 

Dr.  Roger  Schank 

Yale  University 

Computer  Science  Department 

P.0.  Box  2158 

New  Haven,  CT  06520 

Dr.  Alan  H.  Schoenfeld 
University  of  California 
Department  of  Education 
Berkeley,  CA  94720 

Dr.  Janet  Schofield 
Learning  RAD  Center 
University  of  Pittsburgh 
Pittsburgh,  PA  15260 

Dr.  Judith  Segal 

Room  81 9F 

NIE 

1200  19th  Street  N.W. 
Washington,  DC  20208 

Dr.  Ramsay  W.  Selden 
NIE 

Mail  Stop  1241 
1200  19th  St.,  NW 
Washington,  DC  20208 

Dr.  Michael  G.  Shafto 
0NR  Code  442PT 
800  N.  Quincy  Street 
Arlington,  VA  22217-5000 

Dr.  Sylvia  A.  S.  Shafto 
National  Institute  of  Education 
1200  19th  Street 
Mail  Stop  1806 
Washington,  DC  20208 

Dr.  T.  B.  Sheridan 

Dept,  of  Mechanical  Engineering 

MIT 

Cambridge,  MA  02139 


Dr.  Ted  Shortliffe 
Computer  Science  Department 
Stanford  University 
Stanford,  CA  94305 

Dr .  Lee  Shulman 
Stanford  University 
1040  Cathcart  Way 
Stanford,  CA  94305 

Dr.  Miriam  Shustack 
Code  51 

Navy  Personnel  R  &  D  Center 
San  Diego,  CA  92152 

Dr.  Robert  S.  Siegler 
Carnegie-Mellon  University 
Department  of  Psychology 
Schenley  Park 
Pittsburgh,  PA  15213 

Dr.  Herbert  A.  Simon 
Department  of  Psychology 
Carnegie-Mellon  University 
Schenley  Park 
Pittsburgh,  PA  15213 

Dr.  Zita  M  Simutis 
Instructional  Technology 
Systems  Area 
ARI 

5001  Eisenhower  Avenue 
Alexandria,  VA  22333 

Dr.  H.  Wallace  Sinaiko 
Manpower  Research 

and  Advisory  Services 
Smithsonian  Institution 
801  North  Pitt  Street 
Alexandria,  VA  22314 

Dr.  Derek  Sleeraan 
Stanford  University 
School  of  Education 
Stanford,  CA  94305 

Dr.  Edward  E.  Smith 
Bolt  Beranek  A  Newman,  Inc. 
50  Moulton  Street 
Cambridge,  MA  02138 


Dr.  Alfred  F.  Smode 
Senior  Scientist 
Code  7B 

Naval  Training  Equipment  Center 
Orlando.  FL  32813 

Dr.  Richard  Snow 
Liaison  Scientist 
Office  of  Naval  Research 
Branch  Office,  London 
Box  39 

FPO  New  York.  NY  09510 

Dr.  Elliot  Soloway 

Yale  University 

Computer  Science  Department 

P.0.  Box  2158 

New  Haven,  CT  06520 

Dr.  Richard  Sorensen 
Navy  Personnel  RAD  Center 
San  Diego,  CA  92152 

James  J.  Staszewski 
Research  Associate 
Carnegie-Mellon  University 
Department  of  Psychology 
Schenley  Park 
Pittsburgh,  PA  15213 

Dr.  Marian  Stearns 
SRI  International 
333  Ravenswood  Ave. 

Room  B-S324 

Menlo  Park,  CA  94025 

Dr.  Robert  Sternberg 
Department  of  Psychology 
Yale  University 
Box  1 1 A .  Yale  Station 
New  Haven,  CT  06520 

Dr.  Albert  Stevens 

Bolt  Beranek  &  Newman,  Inc. 

10  Moulton  St. 

Cambridge,  MA  02238 

Dr.  Paul  J.  Sticha 
Senior  Staff  Scientist 
Training  Research  Division 
HumRRO 

1100  S.  Washington 
Alexandria,  VA  22314 


Dr.  Thomas  Sticht 

Navy  Personnel  RAD  Center 

San  Diego,  CA  92152 

Dr.  David  Stone 
KAJ  Software,  Inc. 

3420  East  Shea  Blvd. 

Suite  16 1 
Phoenix,  AZ  85028 

Cdr  Michael  Suman,  PD  303 
Naval  Training  Equipment  Center 
Code  N51.  Comptroller 
Orlando,  FL  32813 

Dr.  Hariharan  Swaminathan 
Laboratory  of  Psychometric  and 
Evaluation  Research 
School  of  Education 
University  of  Massachusetts 
Amherst,  MA  01003 

Mr .  Brad  Sympson 

Navy  Personnel  RAD  Center 

San  Diego,  CA  92152 

Dr.  John  Tangney 
AF0SR/NL 

Bolling  AFB ,  X  2033 2 

Dr.  Kikumi  Tatsuoka 
CERL 

252  Engineering  Research 
Laboratory 
Urbana,  IL  61801 

Dr.  Maurice  Tatsuoka 
220  Education  Bldg 
1310  S.  Sixth  St. 

Champaign,  IL  61820 

Dr.  Perry  W.  Thorndyke 
FMC  Corporation 
Central  Engineering  Labs 
1185  Coleman  Avenue,  Box  580 
Santa  Clara,  CA  95052 

Dr.  Douglas  Towne 
Behavioral  Technology  Labs 
1845  S.  Elena  Ave. 

Redondo  Beach,  CA  90277 


Dr.  Amos  Tversky 
Stanford  Unlveralty 
Dept,  of  Paychology 
Stanford,  CA  94305 

Dr.  James  Tweeddale 
Technical  Director 
Navy  Personnel  RAD  Center 
San  Diego,  CA  92152 

Dr.  Paul  Twohig 
Army  Research  Institute 
5001  Eisenhower  Avenue 
Alexandria.  VA  22333 

Dr.  J.  Uhlaner 
Uhlaner  Consultants 
4258  Bonavita  Drive 
Encino,  CA  91436 

Headquarters,  U.  S.  Marine  Corps 
Code  MPI-20 
Washington,  DC  20380 

Dr.  Kurt  Van  Lehn 
Xerox  PARC 

3333  Coyote  Hill  Road 
Palo  Alto,  CA  94304 

Dr.  Beth  Warren 

Bolt  Beranek  &  Newman,  Inc. 

50  Moulton  Street 
Cambridge,  MA  02138 

Dr .  Edward  Wegman 
Office  of  Naval  Research 
Code  411 

800  North  Quincy  Street 
Arlington,  VA  22217-5000 

Dr.  David  J.  Weiss 
N660  Elliott  Hall 
University  of  Minnesota 
75  E.  River  Road 
Minneapolis,  MN  55455 

Dr.  Keith  T.  Wescourt 
FMC  Corporation 
Central  Engineering  Labs 
1185  Coleman  Ave.,  Box  580 
Santa  Clara,  CA  95052 


Dr .  Douglas  Wetzel 
Code  12 

Navy  Personnel  RAD  Center 
San  Diego.  CA  92152 

Dr.  Barbara  White 

Bolt  Beranek  A  Newman,  Inc. 

10  Moulton  Street 
Cambridge,  MA  02238 

Dr.  Hilda  Wing 

Army  Research  Institute 

5001  Eisenhower  Ave. 

Alexandria,  VA  22333 

Dr.  Robert  A.  Wisher 
U.S.  Army  Institute  for  the 

Behavioral  and  Social  Sciences 
5001  Eisenhower  Avenue 
Alexandria,  VA  22333 

Dr.  Martin  F.  Wiskoff 
Navy  Personnel  RAD  Center 
San  Diego,  CA  92152 

Dr.  Frank  Withrow 
U.  S.  Office  of  Education 
400  Maryland  Ave.  SW 
Washington,  DC  20202 

Dr.  Merlin  C.  Wittrock 
Graduate  School  of  Education 
UCLA 

Los  Angeles,  CA  90024 

Mr.  John  H.  Wolfe 

Navy  Personnel  RAD  Center 

San  Diego,  CA  92152 

Dr.  Wallace  Wulfeck,  III 
Navy  Personnel  RAD  Center 
San  Diego,  CA  92152 

Dr.  Joe  Yasatuke 
AFHRL/LRT 

Lowry  AFB,  CO  80230 
Mr.  Carl  York 

System  Development  Foundation 
181  Lytton  Avenue 
Suite  210 

Palo  Alto.  CA  94301 


Dr.  Joseph  L.  Young 
Memory  4  Cognitive 
Processes 

National  Science  Foundation 
Washington,  DC  20550 

Dr.  Steven  Zornetzer 
Office  of  Naval  Research 
Code  440 

800  N.  Quincy  St. 

Arlington,  VA  22217-5000 

Dr.  Michael  J.  Zyda 
Naval  Postgraduate  School 
Code  52CK 

Monterey,  CA  93943 


